1. 19 6月, 2012 1 次提交
  2. 16 6月, 2012 1 次提交
  3. 07 6月, 2012 1 次提交
  4. 06 6月, 2012 6 次提交
    • A
      libceph: make ceph_con_revoke_message() a msg op · 8921d114
      Alex Elder 提交于
      ceph_con_revoke_message() is passed both a message and a ceph
      connection.  A ceph_msg allocated for incoming messages on a
      connection always has a pointer to that connection, so there's no
      need to provide the connection when revoking such a message.
      
      Note that the existing logic does not preclude the message supplied
      being a null/bogus message pointer.  The only user of this interface
      is the OSD client, and the only value an osd client passes is a
      request's r_reply field.  That is always non-null (except briefly in
      an error path in ceph_osdc_alloc_request(), and that drops the
      only reference so the request won't ever have a reply to revoke).
      So we can safely assume the passed-in message is non-null, but add a
      BUG_ON() to make it very obvious we are imposing this restriction.
      
      Rename the function ceph_msg_revoke_incoming() to reflect that it is
      really an operation on an incoming message.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      8921d114
    • A
      libceph: make ceph_con_revoke() a msg operation · 6740a845
      Alex Elder 提交于
      ceph_con_revoke() is passed both a message and a ceph connection.
      Now that any message associated with a connection holds a pointer
      to that connection, there's no need to provide the connection when
      revoking a message.
      
      This has the added benefit of precluding the possibility of the
      providing the wrong connection pointer.  If the message's connection
      pointer is null, it is not being tracked by any connection, so
      revoking it is a no-op.  This is supported as a convenience for
      upper layers, so they can revoke a message that is not actually
      "in flight."
      
      Rename the function ceph_msg_revoke() to reflect that it is really
      an operation on a message, not a connection.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      6740a845
    • A
      libceph: have messages take a connection reference · 92ce034b
      Alex Elder 提交于
      There are essentially two types of ceph messages: incoming and
      outgoing.  Outgoing messages are always allocated via ceph_msg_new(),
      and at the time of their allocation they are not associated with any
      particular connection.  Incoming messages are always allocated via
      ceph_con_in_msg_alloc(), and they are initially associated with the
      connection from which incoming data will be placed into the message.
      
      When an outgoing message gets sent, it becomes associated with a
      connection and remains that way until the message is successfully
      sent.  The association of an incoming message goes away at the point
      it is sent to an upper layer via a con->ops->dispatch method.
      
      This patch implements reference counting for all ceph messages, such
      that every message holds a reference (and a pointer) to a connection
      if and only if it is associated with that connection (as described
      above).
      
      
      For background, here is an explanation of the ceph message
      lifecycle, emphasizing when an association exists between a message
      and a connection.
      
      Outgoing Messages
      An outgoing message is "owned" by its allocator, from the time it is
      allocated in ceph_msg_new() up to the point it gets queued for
      sending in ceph_con_send().  Prior to that point the message's
      msg->con pointer is null; at the point it is queued for sending its
      message pointer is assigned to refer to the connection.  At that
      time the message is inserted into a connection's out_queue list.
      
      When a message on the out_queue list has been sent to the socket
      layer to be put on the wire, it is transferred out of that list and
      into the connection's out_sent list.  At that point it is still owned
      by the connection, and will remain so until an acknowledgement is
      received from the recipient that indicates the message was
      successfully transferred.  When such an acknowledgement is received
      (in process_ack()), the message is removed from its list (in
      ceph_msg_remove()), at which point it is no longer associated with
      the connection.
      
      So basically, any time a message is on one of a connection's lists,
      it is associated with that connection.  Reference counting outgoing
      messages can thus be done at the points a message is added to the
      out_queue (in ceph_con_send()) and the point it is removed from
      either its two lists (in ceph_msg_remove())--at which point its
      connection pointer becomes null.
      
      Incoming Messages
      When an incoming message on a connection is getting read (in
      read_partial_message()) and there is no message in con->in_msg,
      a new one is allocated using ceph_con_in_msg_alloc().  At that
      point the message is associated with the connection.  Once that
      message has been completely and successfully read, it is passed to
      upper layer code using the connection's con->ops->dispatch method.
      At that point the association between the message and the connection
      no longer exists.
      
      Reference counting of connections for incoming messages can be done
      by taking a reference to the connection when the message gets
      allocated, and releasing that reference when it gets handed off
      using the dispatch method.
      
      We should never fail to get a connection reference for a
      message--the since the caller should already hold one.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      92ce034b
    • A
      libceph: have messages point to their connection · 38941f80
      Alex Elder 提交于
      When a ceph message is queued for sending it is placed on a list of
      pending messages (ceph_connection->out_queue).  When they are
      actually sent over the wire, they are moved from that list to
      another (ceph_connection->out_sent).  When acknowledgement for the
      message is received, it is removed from the sent messages list.
      
      During that entire time the message is "in the possession" of a
      single ceph connection.  Keep track of that connection in the
      message.  This will be used in the next patch (and is a helpful
      bit of information for debugging anyway).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      38941f80
    • A
      libceph: tweak ceph_alloc_msg() · 1c20f2d2
      Alex Elder 提交于
      The function ceph_alloc_msg() is only used to allocate a message
      that will be assigned to a connection's in_msg pointer.  Rename the
      function so this implied usage is more clear.
      
      In addition, make that assignment inside the function (again, since
      that's precisely what it's intended to be used for).  This allows us
      to return what is now provided via the passed-in address of a "skip"
      variable.  The return type is now Boolean to be explicit that there
      are only two possible outcomes.
      
      Make sure the result of an ->alloc_msg method call always sets the
      value of *skip properly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1c20f2d2
    • A
      libceph: fully initialize connection in con_init() · 1bfd89f4
      Alex Elder 提交于
      Move the initialization of a ceph connection's private pointer,
      operations vector pointer, and peer name information into
      ceph_con_init().  Rearrange the arguments so the connection pointer
      is first.  Hide the byte-swapping of the peer entity number inside
      ceph_con_init()
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1bfd89f4
  5. 01 6月, 2012 8 次提交
  6. 19 5月, 2012 1 次提交
    • A
      ceph: add auth buf in prepare_write_connect() · 3da54776
      Alex Elder 提交于
      Move the addition of the authorizer buffer to a connection's
      out_kvec out of get_connect_authorizer() and into its caller.  This
      way, the caller--prepare_write_connect()--can avoid adding the
      connect header to out_kvec before it has been fully initialized.
      
      Prior to this patch, it was possible for a connect header to be
      sent over the wire before the authorizer protocol or buffer length
      fields were initialized.  An authorizer buffer associated with that
      header could also be queued to send only after the connection header
      that describes it was on the wire.
      
      Fixes http://tracker.newdream.net/issues/2424Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      3da54776
  7. 17 5月, 2012 12 次提交
  8. 15 5月, 2012 3 次提交
  9. 16 4月, 2012 1 次提交
  10. 22 3月, 2012 6 次提交
    • A
      libceph: isolate kmap() call in write_partial_msg_pages() · 8d63e318
      Alex Elder 提交于
      In write_partial_msg_pages(), every case now does an identical call
      to kmap(page).  Instead, just call it once inside the CRC-computing
      block where it's needed.  Move the definition of kaddr inside that
      block, and make it a (char *) to ensure portable pointer arithmetic.
      
      We still don't kunmap() it until after the sendpage() call, in case
      that also ends up needing to use the mapping.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      8d63e318
    • A
      libceph: rename "page_shift" variable to something sensible · 9bd19663
      Alex Elder 提交于
      In write_partial_msg_pages() there is a local variable used to
      track the starting offset within a bio segment to use.  Its name,
      "page_shift" defies the Linux convention of using that name for
      log-base-2(page size).
      
      Since it's only used in the bio case rename it "bio_offset".  Use it
      along with the page_pos field to compute the memory offset when
      computing CRC's in that function.  This makes the bio case match the
      others more closely.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      9bd19663
    • A
      libceph: get rid of zero_page_address · 0cdf9e60
      Alex Elder 提交于
      There's not a lot of benefit to zero_page_address, which basically
      holds a mapping of the zero page through the life of the messenger
      module.  Even with our own mapping, the sendpage interface where
      it's used may need to kmap() it again.  It's almost certain to
      be in low memory anyway.
      
      So stop treating the zero page specially in write_partial_msg_pages()
      and just get rid of zero_page_address entirely.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      0cdf9e60
    • A
      libceph: only call kernel_sendpage() via helper · e36b13cc
      Alex Elder 提交于
      Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
      used, by using this helper in write_partial_msg_pages().
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      e36b13cc
    • A
      libceph: use kernel_sendpage() for sending zeroes · 31739139
      Alex Elder 提交于
      If a message queued for send gets revoked, zeroes are sent over the
      wire instead of any unsent data.  This is done by constructing a
      message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().
      
      Since we are already working with a page in this case we can use
      the sendpage interface instead.  Create a new ceph_tcp_sendpage()
      helper that sets up flags to match the way ceph_tcp_sendmsg()
      does now.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      31739139
    • A
      libceph: fix inverted crc option logic · 37675b0f
      Alex Elder 提交于
      CRC's are computed for all messages between ceph entities.  The CRC
      computation for the data portion of message can optionally be
      disabled using the "nocrc" (common) ceph option.  The default is
      for CRC computation for the data portion to be enabled.
      
      Unfortunately, the code that implements this feature interprets the
      feature flag wrong, meaning that by default the CRC's have *not*
      been computed (or checked) for the data portion of messages unless
      the "nocrc" option was supplied.
      
      Fix this, in write_partial_msg_pages() and read_partial_message().
      Also change the flag variable in write_partial_msg_pages() to be
      "no_datacrc" to match the usage elsewhere in the file.
      
      This fixes http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      37675b0f