1. 22 8月, 2012 1 次提交
    • J
      libceph: avoid truncation due to racing banners · 6d4221b5
      Jim Schutt 提交于
      Because the Ceph client messenger uses a non-blocking connect, it is
      possible for the sending of the client banner to race with the
      arrival of the banner sent by the peer.
      
      When ceph_sock_state_change() notices the connect has completed, it
      schedules work to process the socket via con_work().  During this
      time the peer is writing its banner, and arrival of the peer banner
      races with con_work().
      
      If con_work() calls try_read() before the peer banner arrives, there
      is nothing for it to do, after which con_work() calls try_write() to
      send the client's banner.  In this case Ceph's protocol negotiation
      can complete succesfully.
      
      The server-side messenger immediately sends its banner and addresses
      after accepting a connect request, *before* actually attempting to
      read or verify the banner from the client.  As a result, it is
      possible for the banner from the server to arrive before con_work()
      calls try_read().  If that happens, try_read() will read the banner
      and prepare protocol negotiation info via prepare_write_connect().
      prepare_write_connect() calls con_out_kvec_reset(), which discards
      the as-yet-unsent client banner.  Next, con_work() calls
      try_write(), which sends the protocol negotiation info rather than
      the banner that the peer is expecting.
      
      The result is that the peer sees an invalid banner, and the client
      reports "negotiation failed".
      
      Fix this by moving con_out_kvec_reset() out of
      prepare_write_connect() to its callers at all locations except the
      one where the banner might still need to be sent.
      
      [elder@inktak.com: added note about server-side behavior]
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      6d4221b5
  2. 31 7月, 2012 21 次提交
  3. 18 7月, 2012 1 次提交
    • S
      libceph: fix messenger retry · 5bdca4e0
      Sage Weil 提交于
      In ancient times, the messenger could both initiate and accept connections.
      An artifact if that was data structures to store/process an incoming
      ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
      Sadly, the negotiation code was referencing those structures and ignoring
      important information (like the peer's connect_seq) from the correct ones.
      
      Among other things, this fixes tight reconnect loops where the server sends
      RETRY_SESSION and we (the client) retries with the same connect_seq as last
      time.  This bug pretty easily triggered by injecting socket failures on the
      MDS and running some fs workload like workunits/direct_io/test_sync_io.
      Signed-off-by: NSage Weil <sage@inktank.com>
      5bdca4e0
  4. 06 7月, 2012 17 次提交
    • S
      libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47
      Sage Weil 提交于
      It is possible to close a socket that is in the OPENING state.  For
      example, it can happen if ceph_con_close() is called on the con before
      the TCP connection is established.  con_work() will come around and shut
      down the socket.
      Signed-off-by: NSage Weil <sage@inktank.com>
      fbb85a47
    • S
      libceph: set peer name on con_open, not init · b7a9e5dd
      Sage Weil 提交于
      The peer name may change on each open attempt, even when the connection is
      reused.
      Signed-off-by: NSage Weil <sage@inktank.com>
      b7a9e5dd
    • A
      libceph: add some fine ASCII art · bc18f4b1
      Alex Elder 提交于
      Sage liked the state diagram I put in my commit description so
      I'm putting it in with the code.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      bc18f4b1
    • A
      libceph: small changes to messenger.c · 5821bd8c
      Alex Elder 提交于
      This patch gathers a few small changes in "net/ceph/messenger.c":
        out_msg_pos_next()
          - small logic change that mostly affects indentation
        write_partial_msg_pages().
          - use a local variable trail_off to represent the offset into
            a message of the trail portion of the data (if present)
          - once we are in the trail portion we will always be there, so we
            don't always need to check against our data position
          - avoid computing len twice after we've reached the trail
          - get rid of the variable tmpcrc, which is not needed
          - trail_off and trail_len never change so mark them const
          - update some comments
        read_partial_message_bio()
          - bio_iovec_idx() will never return an error, so don't bother
            checking for it
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5821bd8c
    • A
      libceph: distinguish two phases of connect sequence · 7593af92
      Alex Elder 提交于
      Currently a ceph connection enters a "CONNECTING" state when it
      begins the process of (re-)connecting with its peer.  Once the two
      ends have successfully exchanged their banner and addresses, an
      additional NEGOTIATING bit is set in the ceph connection's state to
      indicate the connection information exhange has begun.  The
      CONNECTING bit/state continues to be set during this phase.
      
      Rather than have the CONNECTING state continue while the NEGOTIATING
      bit is set, interpret these two phases as distinct states.  In other
      words, when NEGOTIATING is set, clear CONNECTING.  That way only
      one of them will be active at a time.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7593af92
    • A
      libceph: separate banner and connect writes · ab166d5a
      Alex Elder 提交于
      There are two phases in the process of linking together the two ends
      of a ceph connection.  The first involves exchanging a banner and
      IP addresses, and if that is successful a second phase exchanges
      some detail about each side's connection capabilities.
      
      When initiating a connection, the client side now queues to send
      its information for both phases of this process at the same time.
      This is probably a bit more efficient, but it is slightly messier
      from a layering perspective in the code.
      
      So rearrange things so that the client doesn't send the connection
      information until it has received and processed the response in the
      initial banner phase (in process_banner()).
      
      Move the code (in the (con->sock == NULL) case in try_write()) that
      prepares for writing the connection information, delaying doing that
      until the banner exchange has completed.  Move the code that begins
      the transition to this second "NEGOTIATING" phase out of
      process_banner() and into its caller, so preparing to write the
      connection information and preparing to read the response are
      adjacent to each other.
      
      Finally, preparing to write the connection information now requires
      the output kvec to be reset in all cases, so move that into the
      prepare_write_connect() and delete it from all callers.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ab166d5a
    • A
      libceph: define and use an explicit CONNECTED state · e27947c7
      Alex Elder 提交于
      There is no state explicitly defined when a ceph connection is fully
      operational.  So define one.
      
      It's set when the connection sequence completes successfully, and is
      cleared when the connection gets closed.
      
      Be a little more careful when examining the old state when a socket
      disconnect event is reported.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      e27947c7
    • A
      libceph: clear NEGOTIATING when done · 3ec50d18
      Alex Elder 提交于
      A connection state's NEGOTIATING bit gets set while in CONNECTING
      state after we have successfully exchanged a ceph banner and IP
      addresses with the connection's peer (the server).  But that bit
      is not cleared again--at least not until another connection attempt
      is initiated.
      
      Instead, clear it as soon as the connection is fully established.
      Also, clear it when a socket connection gets prematurely closed
      in the midst of establishing a ceph connection (in case we had
      reached the point where it was set).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      3ec50d18
    • A
      libceph: clear CONNECTING in ceph_con_close() · bb9e6bba
      Alex Elder 提交于
      A connection that is closed will no longer be connecting.  So
      clear the CONNECTING state bit in ceph_con_close().  Similarly,
      if the socket has been closed we no longer are in connecting
      state (a new connect sequence will need to be initiated).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      bb9e6bba
    • A
      libceph: don't touch con state in con_close_socket() · 456ea468
      Alex Elder 提交于
      In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
      then cleared while its shutdown method is called and its reference
      gets dropped.
      
      Previously, that flag got set only if it had not already been set,
      so setting it in con_close_socket() might have prevented additional
      processing being done on a socket being shut down.  We no longer set
      SOCK_CLOSED in the socket event routine conditionally, so setting
      that bit here no longer provides whatever benefit it might have
      provided before.
      
      A race condition could still leave the SOCK_CLOSED bit set even
      after we've issued the call to con_close_socket(), so we still clear
      that bit after shutting the socket down.  Add a comment explaining
      the reason for this.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      456ea468
    • A
      libceph: just set SOCK_CLOSED when state changes · d65c9e0b
      Alex Elder 提交于
      When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
      connection flag bit is set, and if it had not been previously set
      queue_con() is called to ensure con_work() will get a chance to
      handle the changed state.
      
      con_work() atomically checks--and if set, clears--the SOCK_CLOSED
      bit if it was set.  This means that even if the bit were set
      repeatedly, the related processing in con_work() only gets called
      once per transition of the bit from 0 to 1.
      
      What's important then is that we ensure con_work() gets called *at
      least* once when a socket close event occurs, not that it gets
      called *exactly* once.
      
      The work queue mechanism already takes care of queueing work
      only if it is not already queued, so there's no need for us
      to call queue_con() conditionally.
      
      So this patch just makes it so the SOCK_CLOSED flag gets set
      unconditionally in ceph_sock_state_change().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      d65c9e0b
    • A
      libceph: don't change socket state on sock event · 188048bc
      Alex Elder 提交于
      Currently the socket state change event handler records an error
      message on a connection to distinguish a close while connecting from
      a close while a connection was already established.
      
      Changing connection information during handling of a socket event is
      not very clean, so instead move this assignment inside con_work(),
      where it can be done during normal connection-level processing (and
      under protection of the connection mutex as well).
      
      Move the handling of a socket closed event up to the top of the
      processing loop in con_work(); there's no point in handling backoff
      etc. if we have a newly-closed socket to take care of.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      188048bc
    • A
      libceph: SOCK_CLOSED is a flag, not a state · a8d00e3c
      Alex Elder 提交于
      The following commit changed it so SOCK_CLOSED bit was stored in
      a connection's new "flags" field rather than its "state" field.
      
          libceph: start separating connection flags from state
          commit 928443cd
      
      That bit is used in con_close_socket() to protect against setting an
      error message more than once in the socket event handler function.
      
      Unfortunately, the field being operated on in that function was not
      updated to be "flags" as it should have been.  This fixes that
      error.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a8d00e3c
    • A
      libceph: don't use bio_iter as a flag · abdaa6a8
      Alex Elder 提交于
      Recently a bug was fixed in which the bio_iter field in a ceph
      message was not being properly re-initialized when a message got
      re-transmitted:
          commit 43643528
          Author: Yan, Zheng <zheng.z.yan@intel.com>
          rbd: Clear ceph_msg->bio_iter for retransmitted message
      
      We are now only initializing the bio_iter field when we are about to
      start to write message data (in prepare_write_message_data()),
      rather than every time we are attempting to write any portion of the
      message data (in write_partial_msg_pages()).  This means we no
      longer need to use the msg->bio_iter field as a flag.
      
      So just don't do that any more.  Trust prepare_write_message_data()
      to ensure msg->bio_iter is properly initialized, every time we are
      about to begin writing (or re-writing) a message's bio data.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      abdaa6a8
    • A
      libceph: move init of bio_iter · 572c588e
      Alex Elder 提交于
      If a message has a non-null bio pointer, its bio_iter field is
      initialized in write_partial_msg_pages() if this has not been done
      already.  This is really a one-time setup operation for sending a
      message's (bio) data, so move that initialization code into
      prepare_write_message_data() which serves that purpose.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      572c588e
    • A
      libceph: move init_bio_*() functions up · df6ad1f9
      Alex Elder 提交于
      Move init_bio_iter() and iter_bio_next() up in their source file so
      the'll be defined before they're needed.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      df6ad1f9
    • A
      libceph: don't mark footer complete before it is · fd154f3c
      Alex Elder 提交于
      This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
      flag before the CRC for the data portion (recorded in the footer)
      has been completely computed.  Hold off setting the complete flag
      until we've decided it's ready to send.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      fd154f3c