1. 21 12月, 2012 1 次提交
    • A
      libceph: report connection fault with warning · 28362986
      Alex Elder 提交于
      When a connection's socket disconnects, or if there's a protocol
      error of some kind on the connection, a fault is signaled and
      the connection is reset (closed and reopened, basically).  We
      currently get an error message on the log whenever this occurs.
      
      A ceph connection will attempt to reestablish a socket connection
      repeatedly if a fault occurs.  This means that these error messages
      will get repeatedly added to the log, which is undesirable.
      
      Change the error message to be a warning, so they don't get
      logged by default.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      28362986
  2. 18 12月, 2012 1 次提交
    • A
      libceph: socket can close in any connection state · 7bb21d68
      Alex Elder 提交于
      A connection's socket can close for any reason, independent of the
      state of the connection (and without irrespective of the connection
      mutex).  As a result, the connectino can be in pretty much any state
      at the time its socket is closed.
      
      Handle those other cases at the top of con_work().  Pull this whole
      block of code into a separate function to reduce the clutter.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7bb21d68
  3. 27 10月, 2012 1 次提交
    • S
      libceph: avoid NULL kref_put from NULL alloc_msg return · 7246240c
      Sage Weil 提交于
      The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which
      may return NULL.  It also drops con->mutex while it allocates a message,
      which means that the connection state may change (e.g., get closed).  If
      that happens, we clean up and bail out.  Avoid calling ceph_msg_put() on
      a NULL return value and triggering a crash.
      
      This was observed when an ->alloc_msg() call races with a timeout that
      resends a zillion messages and resets the connection, and ->alloc_msg()
      returns NULL (because the request was resent to another target).
      
      Fixes http://tracker.newdream.net/issues/3342Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      7246240c
  4. 10 10月, 2012 3 次提交
    • A
      rbd: define common queue_con_delay() · 802c6d96
      Alex Elder 提交于
      This patch defines a single function, queue_con_delay() to call
      queue_delayed_work() for a connection.  It basically generalizes
      what was previously queue_con() by adding the delay argument.
      queue_con() is now a simple helper that passes 0 for its delay.
      queue_con_delay() returns 0 if it queued work or an errno if it
      did not for some reason.
      
      If con_work() finds the BACKOFF flag set for a connection, it now
      calls queue_con_delay() to handle arranging to start again after a
      delay.
      
      Note about connection reference counts:  con_work() only ever gets
      called as a work item function.  At the time that work is scheduled,
      a reference to the connection is acquired, and the corresponding
      con_work() call is then responsible for dropping that reference
      before it returns.
      
      Previously, the backoff handling inside con_work() silently handed
      off its reference to delayed work it scheduled.  Now that
      queue_con_delay() is used, a new reference is acquired for the
      newly-scheduled work, and the original reference is dropped by the
      con->ops->put() call at the end of the function.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      802c6d96
    • A
      rbd: let con_work() handle backoff · 8618e30b
      Alex Elder 提交于
      Both ceph_fault() and con_work() include handling for imposing a
      delay before doing further processing on a faulted connection.
      The latter is used only if ceph_fault() is unable to.
      
      Instead, just let con_work() always be responsible for implementing
      the delay.  After setting up the delay value, set the BACKOFF flag
      on the connection unconditionally and call queue_con() to ensure
      con_work() will get called to handle it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      8618e30b
    • A
      rbd: reset BACKOFF if unable to re-queue · 588377d6
      Alex Elder 提交于
      If ceph_fault() is unable to queue work after a delay, it sets the
      BACKOFF connection flag so con_work() will attempt to do so.
      
      In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
      result in newly-queued work, it simply ignores this condition and
      proceeds as if no backoff delay were desired.  There are two
      problems with this--one of which is a bug.
      
      The first problem is simply that the intended behavior is to back
      off, and if we aren't able queue the work item to run after a delay
      we're not doing that.
      
      The only reason queue_delayed_work() won't queue work is if the
      provided work item is already queued.  In the messenger, this
      means that con_work() is already scheduled to be run again.  So
      if we simply set the BACKOFF flag again when this occurs, we know
      the next con_work() call will again attempt to hold off activity
      on the connection until after the delay.
      
      The second problem--the bug--is a leak of a reference count.  If
      queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
      the connection reference held on entry to con_work().  However,
      processing is (was) allowed to continue, and at the end of the
      function a second con->ops->put() is called.
      
      This patch fixes both problems.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      588377d6
  5. 22 9月, 2012 1 次提交
    • A
      libceph: only kunmap kmapped pages · 5ce765a5
      Alex Elder 提交于
      In write_partial_msg_pages(), pages need to be kmapped in order to
      perform a CRC-32c calculation on them.  As an artifact of the way
      this code used to be structured, the kunmap() call was separated
      from the kmap() call and both were done conditionally.  But the
      conditions under which the kmap() and kunmap() calls were made
      differed, so there was a chance a kunmap() call would be done on a
      page that had not been mapped.
      
      The symptom of this was tripping a BUG() in kunmap_high() when
      pkmap_count[nr] became 0.
      Reported-by: NBryan K. Wright <bryan@virginia.edu>
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5ce765a5
  6. 22 8月, 2012 1 次提交
    • J
      libceph: avoid truncation due to racing banners · 6d4221b5
      Jim Schutt 提交于
      Because the Ceph client messenger uses a non-blocking connect, it is
      possible for the sending of the client banner to race with the
      arrival of the banner sent by the peer.
      
      When ceph_sock_state_change() notices the connect has completed, it
      schedules work to process the socket via con_work().  During this
      time the peer is writing its banner, and arrival of the peer banner
      races with con_work().
      
      If con_work() calls try_read() before the peer banner arrives, there
      is nothing for it to do, after which con_work() calls try_write() to
      send the client's banner.  In this case Ceph's protocol negotiation
      can complete succesfully.
      
      The server-side messenger immediately sends its banner and addresses
      after accepting a connect request, *before* actually attempting to
      read or verify the banner from the client.  As a result, it is
      possible for the banner from the server to arrive before con_work()
      calls try_read().  If that happens, try_read() will read the banner
      and prepare protocol negotiation info via prepare_write_connect().
      prepare_write_connect() calls con_out_kvec_reset(), which discards
      the as-yet-unsent client banner.  Next, con_work() calls
      try_write(), which sends the protocol negotiation info rather than
      the banner that the peer is expecting.
      
      The result is that the peer sees an invalid banner, and the client
      reports "negotiation failed".
      
      Fix this by moving con_out_kvec_reset() out of
      prepare_write_connect() to its callers at all locations except the
      one where the banner might still need to be sent.
      
      [elder@inktak.com: added note about server-side behavior]
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      6d4221b5
  7. 31 7月, 2012 21 次提交
  8. 18 7月, 2012 1 次提交
    • S
      libceph: fix messenger retry · 5bdca4e0
      Sage Weil 提交于
      In ancient times, the messenger could both initiate and accept connections.
      An artifact if that was data structures to store/process an incoming
      ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
      Sadly, the negotiation code was referencing those structures and ignoring
      important information (like the peer's connect_seq) from the correct ones.
      
      Among other things, this fixes tight reconnect loops where the server sends
      RETRY_SESSION and we (the client) retries with the same connect_seq as last
      time.  This bug pretty easily triggered by injecting socket failures on the
      MDS and running some fs workload like workunits/direct_io/test_sync_io.
      Signed-off-by: NSage Weil <sage@inktank.com>
      5bdca4e0
  9. 06 7月, 2012 10 次提交
    • S
      libceph: allow sock transition from CONNECTING to CLOSED · fbb85a47
      Sage Weil 提交于
      It is possible to close a socket that is in the OPENING state.  For
      example, it can happen if ceph_con_close() is called on the con before
      the TCP connection is established.  con_work() will come around and shut
      down the socket.
      Signed-off-by: NSage Weil <sage@inktank.com>
      fbb85a47
    • S
      libceph: set peer name on con_open, not init · b7a9e5dd
      Sage Weil 提交于
      The peer name may change on each open attempt, even when the connection is
      reused.
      Signed-off-by: NSage Weil <sage@inktank.com>
      b7a9e5dd
    • A
      libceph: add some fine ASCII art · bc18f4b1
      Alex Elder 提交于
      Sage liked the state diagram I put in my commit description so
      I'm putting it in with the code.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      bc18f4b1
    • A
      libceph: small changes to messenger.c · 5821bd8c
      Alex Elder 提交于
      This patch gathers a few small changes in "net/ceph/messenger.c":
        out_msg_pos_next()
          - small logic change that mostly affects indentation
        write_partial_msg_pages().
          - use a local variable trail_off to represent the offset into
            a message of the trail portion of the data (if present)
          - once we are in the trail portion we will always be there, so we
            don't always need to check against our data position
          - avoid computing len twice after we've reached the trail
          - get rid of the variable tmpcrc, which is not needed
          - trail_off and trail_len never change so mark them const
          - update some comments
        read_partial_message_bio()
          - bio_iovec_idx() will never return an error, so don't bother
            checking for it
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      5821bd8c
    • A
      libceph: distinguish two phases of connect sequence · 7593af92
      Alex Elder 提交于
      Currently a ceph connection enters a "CONNECTING" state when it
      begins the process of (re-)connecting with its peer.  Once the two
      ends have successfully exchanged their banner and addresses, an
      additional NEGOTIATING bit is set in the ceph connection's state to
      indicate the connection information exhange has begun.  The
      CONNECTING bit/state continues to be set during this phase.
      
      Rather than have the CONNECTING state continue while the NEGOTIATING
      bit is set, interpret these two phases as distinct states.  In other
      words, when NEGOTIATING is set, clear CONNECTING.  That way only
      one of them will be active at a time.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7593af92
    • A
      libceph: separate banner and connect writes · ab166d5a
      Alex Elder 提交于
      There are two phases in the process of linking together the two ends
      of a ceph connection.  The first involves exchanging a banner and
      IP addresses, and if that is successful a second phase exchanges
      some detail about each side's connection capabilities.
      
      When initiating a connection, the client side now queues to send
      its information for both phases of this process at the same time.
      This is probably a bit more efficient, but it is slightly messier
      from a layering perspective in the code.
      
      So rearrange things so that the client doesn't send the connection
      information until it has received and processed the response in the
      initial banner phase (in process_banner()).
      
      Move the code (in the (con->sock == NULL) case in try_write()) that
      prepares for writing the connection information, delaying doing that
      until the banner exchange has completed.  Move the code that begins
      the transition to this second "NEGOTIATING" phase out of
      process_banner() and into its caller, so preparing to write the
      connection information and preparing to read the response are
      adjacent to each other.
      
      Finally, preparing to write the connection information now requires
      the output kvec to be reset in all cases, so move that into the
      prepare_write_connect() and delete it from all callers.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ab166d5a
    • A
      libceph: define and use an explicit CONNECTED state · e27947c7
      Alex Elder 提交于
      There is no state explicitly defined when a ceph connection is fully
      operational.  So define one.
      
      It's set when the connection sequence completes successfully, and is
      cleared when the connection gets closed.
      
      Be a little more careful when examining the old state when a socket
      disconnect event is reported.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      e27947c7
    • A
      libceph: clear NEGOTIATING when done · 3ec50d18
      Alex Elder 提交于
      A connection state's NEGOTIATING bit gets set while in CONNECTING
      state after we have successfully exchanged a ceph banner and IP
      addresses with the connection's peer (the server).  But that bit
      is not cleared again--at least not until another connection attempt
      is initiated.
      
      Instead, clear it as soon as the connection is fully established.
      Also, clear it when a socket connection gets prematurely closed
      in the midst of establishing a ceph connection (in case we had
      reached the point where it was set).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      3ec50d18
    • A
      libceph: clear CONNECTING in ceph_con_close() · bb9e6bba
      Alex Elder 提交于
      A connection that is closed will no longer be connecting.  So
      clear the CONNECTING state bit in ceph_con_close().  Similarly,
      if the socket has been closed we no longer are in connecting
      state (a new connect sequence will need to be initiated).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      bb9e6bba
    • A
      libceph: don't touch con state in con_close_socket() · 456ea468
      Alex Elder 提交于
      In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
      then cleared while its shutdown method is called and its reference
      gets dropped.
      
      Previously, that flag got set only if it had not already been set,
      so setting it in con_close_socket() might have prevented additional
      processing being done on a socket being shut down.  We no longer set
      SOCK_CLOSED in the socket event routine conditionally, so setting
      that bit here no longer provides whatever benefit it might have
      provided before.
      
      A race condition could still leave the SOCK_CLOSED bit set even
      after we've issued the call to con_close_socket(), so we still clear
      that bit after shutting the socket down.  Add a comment explaining
      the reason for this.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      456ea468