1. 18 1月, 2013 7 次提交
    • A
      libceph: pass length to ceph_calc_file_object_mapping() · e8afad65
      Alex Elder 提交于
      ceph_calc_file_object_mapping() takes (among other things) a "file"
      offset and length, and based on the layout, determines the object
      number ("bno") backing the affected portion of the file's data and
      the offset into that object where the desired range begins.  It also
      computes the size that should be used for the request--either the
      amount requested or something less if that would exceed the end of
      the object.
      
      This patch changes the input length parameter in this function so it
      is used only for input.  That is, the argument will be passed by
      value rather than by address, so the value provided won't get
      updated by the function.
      
      The value would only get updated if the length would surpass the
      current object, and in that case the value it got updated to would
      be exactly that returned in *oxlen.
      
      Only one of the two callers is affected by this change.  Update
      ceph_calc_raw_layout() so it records any updated value.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e8afad65
    • A
      libceph: pass length to ceph_osdc_build_request() · 0120be3c
      Alex Elder 提交于
      The len argument to ceph_osdc_build_request() is set up to be
      passed by address, but that function never updates its value
      so there's no need to do this.  Tighten up the interface by
      passing the length directly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0120be3c
    • A
      libceph: kill op_needs_trail() · 5b9d1b1c
      Alex Elder 提交于
      Since every osd message is now prepared to include trailing data,
      there's no need to check ahead of time whether any operations will
      make use of the trail portion of the message.
      
      We can drop the second argument to get_num_ops(), and as a result we
      can also get rid of op_needs_trail() which is no longer used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5b9d1b1c
    • A
      libceph: always allow trail in osd request · c885837f
      Alex Elder 提交于
      An osd request structure contains an optional trail portion, which
      if present will contain data to be passed in the payload portion of
      the message containing the request.  The trail field is a
      ceph_pagelist pointer, and if null it indicates there is no trail.
      
      A ceph_pagelist structure contains a length field, and it can
      legitimately hold value 0.  Make use of this to change the
      interpretation of the "trail" of an osd request so that every osd
      request has trailing data, it just might have length 0.
      
      This means we change the r_trail field in a ceph_osd_request
      structure from a pointer to a structure that is always initialized.
      
      Note that in ceph_osdc_start_request(), the trail pointer (or now
      address of that structure) is assigned to a ceph message's trail
      field.  Here's why that's still OK (looking at net/ceph/messenger.c):
          - What would have resulted in a null pointer previously will now
            refer to a 0-length page list.  That message trail pointer
            is used in two functions, write_partial_msg_pages() and
            out_msg_pos_next().
          - In write_partial_msg_pages(), a null page list pointer is
            handled the same as a message with 0-length trail, and both
            result in a "in_trail" variable set to false.  The trail
            pointer is only used if in_trail is true.
          - The only other place the message trail pointer is used is
            out_msg_pos_next().  That function is only called by
            write_partial_msg_pages() and only touches the trail pointer
            if the in_trail value it is passed is true.
      Therefore a null ceph_msg->trail pointer is equivalent to a non-null
      pointer referring to a 0-length page list structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c885837f
    • A
      rbd: drop oid parameters from ceph_osdc_build_request() · af77f26c
      Alex Elder 提交于
      The last two parameters to ceph_osd_build_request() describe the
      object id, but the values passed always come from the osd request
      structure whose address is also provided.  Get rid of those last
      two parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      af77f26c
    • A
      libceph: reformat __reset_osd() · c3acb181
      Alex Elder 提交于
      Reformat __reset_osd() into three distinct blocks of code
      handling the three return cases.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c3acb181
    • Y
      ceph: re-calculate truncate_size for strip object · a41bad1a
      Yan, Zheng 提交于
      Otherwise osd may truncate the object to larger size.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a41bad1a
  2. 28 12月, 2012 2 次提交
    • A
      libceph: always reset osds when kicking · e6d50f67
      Alex Elder 提交于
      When ceph_osdc_handle_map() is called to process a new osd map,
      kick_requests() is called to ensure all affected requests are
      updated if necessary to reflect changes in the osd map.  This
      happens in two cases:  whenever an incremental map update is
      processed; and when a full map update (or the last one if there is
      more than one) gets processed.
      
      In the former case, the kick_requests() call is followed immediately
      by a call to reset_changed_osds() to ensure any connections to osds
      affected by the map change are reset.  But for full map updates
      this isn't done.
      
      Both cases should be doing this osd reset.
      
      Rather than duplicating the reset_changed_osds() call, move it into
      the end of kick_requests().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      e6d50f67
    • A
      libceph: move linger requests sooner in kick_requests() · ab60b16d
      Alex Elder 提交于
      The kick_requests() function is called by ceph_osdc_handle_map()
      when an osd map change has been indicated.  Its purpose is to
      re-queue any request whose target osd is different from what it
      was when it was originally sent.
      
      It is structured as two loops, one for incomplete but registered
      requests, and a second for handling completed linger requests.
      As a special case, in the first loop if a request marked to linger
      has not yet completed, it is moved from the request list to the
      linger list.  This is as a quick and dirty way to have the second
      loop handle sending the request along with all the other linger
      requests.
      
      Because of the way it's done now, however, this quick and dirty
      solution can result in these incomplete linger requests never
      getting re-sent as desired.  The problem lies in the fact that
      the second loop only arranges for a linger request to be sent
      if it appears its target osd has changed.  This is the proper
      handling for *completed* linger requests (it avoids issuing
      the same linger request twice to the same osd).
      
      But although the linger requests added to the list in the first loop
      may have been sent, they have not yet completed, so they need to be
      re-sent regardless of whether their target osd has changed.
      
      The first required fix is we need to avoid calling __map_request()
      on any incomplete linger request.  Otherwise the subsequent
      __map_request() call in the second loop will find the target osd
      has not changed and will therefore not re-send the request.
      
      Second, we need to be sure that a sent but incomplete linger request
      gets re-sent.  If the target osd is the same with the new osd map as
      it was when the request was originally sent, this won't happen.
      This can be fixed through careful handling when we move these
      requests from the request list to the linger list, by unregistering
      the request *before* it is registered as a linger request.  This
      works because a side-effect of unregistering the request is to make
      the request's r_osd pointer be NULL, and *that* will ensure the
      second loop actually re-sends the linger request.
      
      Processing of such a request is done at that point, so continue with
      the next one once it's been moved.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ab60b16d
  3. 21 12月, 2012 4 次提交
  4. 18 12月, 2012 1 次提交
    • A
      rbd: remove linger unconditionally · 61c74035
      Alex Elder 提交于
      In __unregister_linger_request(), the request is being removed
      from the osd client's req_linger list only when the request
      has a non-null osd pointer.  It should be done whether or not
      the request currently has an osd.
      
      This is most likely a non-issue because I believe the request
      will always have an osd when this function is called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      61c74035
  5. 17 12月, 2012 2 次提交
    • A
      libceph: avoid using freed osd in __kick_osd_requests() · 685a7555
      Alex Elder 提交于
      If an osd has no requests and no linger requests, __reset_osd()
      will just remove it with a call to __remove_osd().  That drops
      a reference to the osd, and therefore the osd may have been free
      by the time __reset_osd() returns.  That function offers no
      indication this may have occurred, and as a result the osd will
      continue to be used even when it's no longer valid.
      
      Change__reset_osd() so it returns an error (ENODEV) when it
      deletes the osd being reset.  And change __kick_osd_requests() so it
      returns immediately (before referencing osd again) if __reset_osd()
      returns *any* error.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      685a7555
    • A
      ceph: don't reference req after put · 7d5f2481
      Alex Elder 提交于
      In __unregister_request(), there is a call to list_del_init()
      referencing a request that was the subject of a call to
      ceph_osdc_put_request() on the previous line.  This is not
      safe, because the request structure could have been freed
      by the time we reach the list_del_init().
      
      Fix this by reversing the order of these lines.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-off-by: NSage Weil <sage@inktank.com>
      7d5f2481
  6. 13 12月, 2012 1 次提交
    • S
      libceph: remove 'osdtimeout' option · 83aff95e
      Sage Weil 提交于
      This would reset a connection with any OSD that had an outstanding
      request that was taking more than N seconds.  The idea was that if the
      OSD was buggy, the client could compensate by resending the request.
      
      In reality, this only served to hide server bugs, and we haven't
      actually seen such a bug in quite a while.  Moreover, the userspace
      client code never did this.
      
      More importantly, often the request is taking a long time because the
      OSD is trying to recover, or overloaded, and killing the connection
      and retrying would only make the situation worse by giving the OSD
      more work to do.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      83aff95e
  7. 02 10月, 2012 2 次提交
  8. 31 7月, 2012 5 次提交
  9. 06 7月, 2012 1 次提交
  10. 20 6月, 2012 2 次提交
    • S
      libceph: use con get/put ops from osd_client · 88ed6ea0
      Sage Weil 提交于
      There were a few direct calls to ceph_con_{get,put}() instead of the con
      ops from osd_client.c.  This is a bug since those ops aren't defined to
      be ceph_con_get/put.
      
      This breaks refcounting on the ceph_osd structs that contain the
      ceph_connections, and could lead to all manner of strangeness.
      
      The purpose of the ->get and ->put methods in a ceph connection are
      to allow the connection to indicate it has a reference to something
      external to the messaging system, *not* to indicate something
      external has a reference to the connection.
      
      [elder@inktank.com: added that last sentence]
      Signed-off-by: NSage Weil <sage@newdream.net>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      (cherry picked from commit 0d47766f)
      88ed6ea0
    • A
      libceph: osd_client: don't drop reply reference too early · 680584fa
      Alex Elder 提交于
      In ceph_osdc_release_request(), a reference to the r_reply message
      is dropped.  But just after that, that same message is revoked if it
      was in use to receive an incoming reply.  Reorder these so we are
      sure we hold a reference until we're actually done with the message.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      (cherry picked from commit ab8cb34a)
      680584fa
  11. 06 6月, 2012 6 次提交
    • A
      libceph: make ceph_con_revoke_message() a msg op · 8921d114
      Alex Elder 提交于
      ceph_con_revoke_message() is passed both a message and a ceph
      connection.  A ceph_msg allocated for incoming messages on a
      connection always has a pointer to that connection, so there's no
      need to provide the connection when revoking such a message.
      
      Note that the existing logic does not preclude the message supplied
      being a null/bogus message pointer.  The only user of this interface
      is the OSD client, and the only value an osd client passes is a
      request's r_reply field.  That is always non-null (except briefly in
      an error path in ceph_osdc_alloc_request(), and that drops the
      only reference so the request won't ever have a reply to revoke).
      So we can safely assume the passed-in message is non-null, but add a
      BUG_ON() to make it very obvious we are imposing this restriction.
      
      Rename the function ceph_msg_revoke_incoming() to reflect that it is
      really an operation on an incoming message.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      8921d114
    • A
      libceph: make ceph_con_revoke() a msg operation · 6740a845
      Alex Elder 提交于
      ceph_con_revoke() is passed both a message and a ceph connection.
      Now that any message associated with a connection holds a pointer
      to that connection, there's no need to provide the connection when
      revoking a message.
      
      This has the added benefit of precluding the possibility of the
      providing the wrong connection pointer.  If the message's connection
      pointer is null, it is not being tracked by any connection, so
      revoking it is a no-op.  This is supported as a convenience for
      upper layers, so they can revoke a message that is not actually
      "in flight."
      
      Rename the function ceph_msg_revoke() to reflect that it is really
      an operation on a message, not a connection.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      6740a845
    • A
      libceph: tweak ceph_alloc_msg() · 1c20f2d2
      Alex Elder 提交于
      The function ceph_alloc_msg() is only used to allocate a message
      that will be assigned to a connection's in_msg pointer.  Rename the
      function so this implied usage is more clear.
      
      In addition, make that assignment inside the function (again, since
      that's precisely what it's intended to be used for).  This allows us
      to return what is now provided via the passed-in address of a "skip"
      variable.  The return type is now Boolean to be explicit that there
      are only two possible outcomes.
      
      Make sure the result of an ->alloc_msg method call always sets the
      value of *skip properly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1c20f2d2
    • A
      libceph: fully initialize connection in con_init() · 1bfd89f4
      Alex Elder 提交于
      Move the initialization of a ceph connection's private pointer,
      operations vector pointer, and peer name information into
      ceph_con_init().  Rearrange the arguments so the connection pointer
      is first.  Hide the byte-swapping of the peer entity number inside
      ceph_con_init()
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1bfd89f4
    • S
      libceph: use con get/put ops from osd_client · 0d47766f
      Sage Weil 提交于
      There were a few direct calls to ceph_con_{get,put}() instead of the con
      ops from osd_client.c.  This is a bug since those ops aren't defined to
      be ceph_con_get/put.
      
      This breaks refcounting on the ceph_osd structs that contain the
      ceph_connections, and could lead to all manner of strangeness.
      
      The purpose of the ->get and ->put methods in a ceph connection are
      to allow the connection to indicate it has a reference to something
      external to the messaging system, *not* to indicate something
      external has a reference to the connection.
      
      [elder@inktank.com: added that last sentence]
      Signed-off-by: NSage Weil <sage@newdream.net>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      0d47766f
    • A
      libceph: osd_client: don't drop reply reference too early · ab8cb34a
      Alex Elder 提交于
      In ceph_osdc_release_request(), a reference to the r_reply message
      is dropped.  But just after that, that same message is revoked if it
      was in use to receive an incoming reply.  Reorder these so we are
      sure we hold a reference until we're actually done with the message.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ab8cb34a
  12. 01 6月, 2012 2 次提交
  13. 19 5月, 2012 1 次提交
  14. 17 5月, 2012 4 次提交