1. 03 1月, 2017 2 次提交
  2. 18 11月, 2016 1 次提交
    • S
      RDS: TCP: Track peer's connection generation number · 905dd418
      Sowmini Varadhan 提交于
      The RDS transport has to be able to distinguish between
      two types of failure events:
      (a) when the transport fails (e.g., TCP connection reset)
          but the RDS socket/connection layer on both sides stays
          the same
      (b) when the peer's RDS layer itself resets (e.g., due to module
          reload or machine reboot at the peer)
      In case (a) both sides must reconnect and continue the RDS messaging
      without any message loss or disruption to the message sequence numbers,
      and this is achieved by rds_send_path_reset().
      
      In case (b) we should reset all rds_connection state to the
      new incarnation of the peer. Examples of state that needs to
      be reset are next expected rx sequence number from, or messages to be
      retransmitted to, the new incarnation of the peer.
      
      To achieve this, the RDS handshake probe added as part of
      commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP")
      is enhanced so that sender and receiver of the RDS ping-probe
      will add a generation number as part of the RDS_EXTHDR_GEN_NUM
      extension header. Each peer stores local and remote generation
      numbers as part of each rds_connection. Changes in generation
      number will be detected via incoming handshake probe ping
      request or response and will allow the receiver to reset rds_connection
      state.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      905dd418
  3. 30 10月, 2016 1 次提交
  4. 17 10月, 2016 1 次提交
  5. 09 8月, 2016 1 次提交
  6. 16 7月, 2016 1 次提交
  7. 02 7月, 2016 3 次提交
  8. 15 6月, 2016 11 次提交
  9. 08 6月, 2016 1 次提交
    • S
      RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one() · 9c79440e
      Sowmini Varadhan 提交于
      The send path needs to be quiesced before resetting callbacks from
      rds_tcp_accept_one(), and commit eb192840 ("RDS:TCP: Synchronize
      rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
      this using the c_state and RDS_IN_XMIT bit following the pattern
      used by rds_conn_shutdown(). However this leaves the possibility
      of a race window as shown in the sequence below
          take t_conn_lock in rds_tcp_conn_connect
          send outgoing syn to peer
          drop t_conn_lock in rds_tcp_conn_connect
          incoming from peer triggers rds_tcp_accept_one, conn is
      	marked CONNECTING
          wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
          call rds_tcp_reset_callbacks
          [.. race-window where incoming syn-ack can cause the conn
      	to be marked UP from rds_tcp_state_change ..]
          lock_sock called from rds_tcp_reset_callbacks, and we set
      	t_sock to null
      As soon as the conn is marked UP in the race-window above, rds_send_xmit()
      threads will proceed to rds_tcp_xmit and may encounter a null-pointer
      deref on the t_sock.
      
      Given that rds_tcp_state_change() is invoked in softirq context, whereas
      rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
      after lock_sock could result in a deadlock with tcp_sendmsg, this
      commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
      will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c79440e
  10. 03 3月, 2016 1 次提交
  11. 03 11月, 2015 1 次提交
  12. 05 10月, 2015 1 次提交
    • S
      RDS: Use a single TCP socket for both send and receive. · 3b20fc38
      Sowmini Varadhan 提交于
      Commit f711a6ae ("net/rds: RDS-TCP: Always create a new rds_sock
      for an incoming connection.") modified rds-tcp so that an incoming SYN
      would ignore an existing "client" TCP connection which had the local
      port set to the transient port.  The motivation for ignoring the existing
      "client" connection in f711a6ae was to avoid race conditions and an
      endless duel of reconnect attempts triggered by a restart/abort of one
      of the nodes in the TCP connection.
      
      However, having separate sockets for active and passive sides
      is avoidable, and the simpler model of a single TCP socket for
      both send and receives of all RDS connections associated with
      that tcp socket makes for easier observability. We avoid the race
      conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
      if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
      The c_outgoing bit is initialized in __rds_conn_create().
      
      A side-effect of re-using the client rds_connection for an incoming
      SYN is the potential of encountering duelling SYNs, i.e., we
      have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
      SYN. The logic to arbitrate this criss-crossing SYN exchange in
      rds_tcp_accept_one() has been modified to emulate the BGP state
      machine: the smaller IP address should back off from the connection attempt.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b20fc38
  13. 01 10月, 2015 1 次提交
    • S
      RDS: Use per-bucket rw lock for bind hash-table · 9b9acde7
      Santosh Shilimkar 提交于
      One global lock protecting hash-tables with 1024 buckets isn't
      efficient and it shows up in a massive systems with truck
      loads of RDS sockets serving multiple databases. The
      perf data clearly highlights the contention on the rw
      lock in these massive workloads.
      
      When the contention gets worse, the code gets into a state where
      it decides to back off on the lock. So while it has disabled interrupts,
      it sits and backs off on this lock get. This causes the system to
      become sluggish and eventually all sorts of bad things happen.
      
      The simple fix is to move the lock into the hash bucket and
      use per-bucket lock to improve the scalability.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      9b9acde7
  14. 26 8月, 2015 1 次提交
  15. 08 8月, 2015 1 次提交
  16. 02 6月, 2015 1 次提交
    • W
      rds: re-entry of rds_ib_xmit/rds_iw_xmit · d655a9fb
      Wengang Wang 提交于
      The BUG_ON at line 452/453 is triggered in function rds_send_xmit.
      
       441                         while (ret) {
       442                                 tmp = min_t(int, ret, sg->length -
       443                                                       conn->c_xmit_data_off);
       444                                 conn->c_xmit_data_off += tmp;
       445                                 ret -= tmp;
       446                                 if (conn->c_xmit_data_off == sg->length) {
       447                                         conn->c_xmit_data_off = 0;
       448                                         sg++;
       449                                         conn->c_xmit_sg++;
       450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
       451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
       452                                         BUG_ON(ret != 0 &&
       453                                                conn->c_xmit_sg == rm->data.op_nents);
       454                                 }
       455                         }
      
      it is complaining the total sent length is bigger that we want to send.
      
      rds_ib_xmit() is wrong for the second entry for the same rds_message returning
      wrong value.
      
      the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
      scatterlist.offset/length, but the rds_ib_xmit action is based on
      scatterlist.dma_address/dma_length. in case dma_length is larger than length
      there is problem. for the 2nd and later entries of rds_ib_xmit for same
      rds_message, at least one of the following two is wrong:
      
      1) the scatterlist to start with,  the choosen one can far beyond the correct
         one.
      2) the offset to start with within the scatterlist.
      
      fix:
      add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
      and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
      and op_dmaoff are initialized to zero when doing dma mapping for the first see
      of the message and are changed when filling send slots.
      
      the same applies to rds_iw_xmit too.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d655a9fb
  17. 01 6月, 2015 2 次提交
  18. 19 5月, 2015 1 次提交
  19. 09 4月, 2015 1 次提交
  20. 03 3月, 2015 1 次提交
  21. 24 11月, 2014 2 次提交
  22. 20 10月, 2013 1 次提交
  23. 20 3月, 2012 1 次提交
  24. 01 11月, 2011 1 次提交
  25. 20 1月, 2011 1 次提交