1. 02 5月, 2013 40 次提交
    • A
      rbd: define image request originator flag · 9849e986
      Alex Elder 提交于
      Define a flag indicating whether an image request originated from
      the Linux block layer (from blk_fetch_request()) or whether it was
      initiated in order to satisfy an object request for a child image
      of a layered rbd device.  For image requests initiated by objects of
      child images we'll save a pointer to the object request rather than
      the Linux block request.
      
      For now, only block requests are used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9849e986
    • A
      rbd: define image request flags · 0c425248
      Alex Elder 提交于
      There are several Boolean values we'll be maintaining for image
      requests.  Switch from the single write_request field to a
      general-purpose flags field, and use one if its bits to represent
      the direction of I/O for the image request.  Define helper functions
      for setting and testing that flag.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0c425248
    • A
      rbd: record image-relative offset in object requests · 7da22d29
      Alex Elder 提交于
      For an image object request we will need to know what offset within
      the rbd image the request covers.  Record that when the object
      request gets created.
      
      Update the I/O error warnings so they use this so what's reported
      is more informative.
      
      Rename a local variable to fit the convention used everywhere else.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      7da22d29
    • A
      rbd: record aggregate image transfer count · 55f27e09
      Alex Elder 提交于
      Compute the total number of bytes transferred for an image
      request--the sum across each of the request's object requests.
      To avoid contention do it only when all object requests are
      complete, in rbd_img_request_complete().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      55f27e09
    • A
      rbd: record overall image request result · a5a337d4
      Alex Elder 提交于
      If any image object request produces a non-zero result, preserve
      that as the result of the overall image request.  If multiple
      objects have non-zero results, save only the first one.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a5a337d4
    • A
      rbd: update feature bits · 5cbf6f12
      Alex Elder 提交于
      There is a new rbd feature bit defined for "fancy striping." Add
      it to the ones defined in the kernel client.
      
      Change RBD_FEATURES_ALL so it represents the set of all feature
      bits (rather than just the ones we support).  Define a new symbol
      RBD_FEATURES_SUPPORTED to indicate the supported ones.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5cbf6f12
    • A
      libceph: make method call data be a separate data item · 04017e29
      Alex Elder 提交于
      Right now the data for a method call is specified via a pointer and
      length, and it's copied--along with the class and method name--into
      a pagelist data item to be sent to the osd.  Instead, encode the
      data in a data item separate from the class and method names.
      
      This will allow large amounts of data to be supplied to methods
      without copying.  Only rbd uses the class functionality right now,
      and when it really needs this it will probably need to use a page
      array rather than a page list.  But this simple implementation
      demonstrates the functionality on the osd client, and that's enough
      for now.
      
      This resolves:
          http://tracker.ceph.com/issues/4104Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      04017e29
    • A
      libceph: add, don't set data for a message · 90af3602
      Alex Elder 提交于
      Change the names of the functions that put data on a pagelist to
      reflect that we're adding to whatever's already there rather than
      just setting it to the one thing.  Currently only one data item is
      ever added to a message, but that's about to change.
      
      This resolves:
          http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      90af3602
    • A
      libceph: implement multiple data items in a message · ca8b3a69
      Alex Elder 提交于
      This patch adds support to the messenger for more than one data item
      in its data list.
      
      A message data cursor has two more fields to support this:
          - a count of the number of bytes left to be consumed across
            all data items in the list, "total_resid"
          - a pointer to the head of the list (for validation only)
      
      The cursor initialization routine has been split into two parts: the
      outer one, which initializes the cursor for traversing the entire
      list of data items; and the inner one, which initializes the cursor
      to start processing a single data item.
      
      When a message cursor is first initialized, the outer initialization
      routine sets total_resid to the length provided.  The data pointer
      is initialized to the first data item on the list.  From there, the
      inner initialization routine finishes by setting up to process the
      data item the cursor points to.
      
      Advancing the cursor consumes bytes in total_resid.  If the resid
      field reaches zero, it means the current data item is fully
      consumed.  If total_resid indicates there is more data, the cursor
      is advanced to point to the next data item, and then the inner
      initialization routine prepares for using that.  (A check is made at
      this point to make sure we don't wrap around the front of the list.)
      
      The type-specific init routines are modified so they can be given a
      length that's larger than what the data item can support.  The resid
      field is initialized to the smaller of the provided length and the
      length of the entire data item.
      
      When total_resid reaches zero, we're done.
      
      This resolves:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ca8b3a69
    • A
      libceph: replace message data pointer with list · 5240d9f9
      Alex Elder 提交于
      In place of the message data pointer, use a list head which links
      through message data items.  For now we only support a single entry
      on that list.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5240d9f9
    • A
      libceph: have cursor point to data · 8ae4f4f5
      Alex Elder 提交于
      Rather than having a ceph message data item point to the cursor it's
      associated with, have the cursor point to a data item.  This will
      allow a message cursor to be used for more than one data item.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8ae4f4f5
    • A
      libceph: move cursor into message · 36153ec9
      Alex Elder 提交于
      A message will only be processing a single data item at a time, so
      there's no need for each data item to have its own cursor.
      
      Move the cursor embedded in the message data structure into the
      message itself.  To minimize the impact, keep the data->cursor
      field, but make it be a pointer to the cursor in the message.
      
      Move the definition of ceph_msg_data above ceph_msg_data_cursor so
      the cursor can point to the data without a forward definition rather
      than vice-versa.
      
      This and the upcoming patches are part of:
          http://tracker.ceph.com/issues/3761Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      36153ec9
    • A
      libceph: record bio length · c851c495
      Alex Elder 提交于
      The bio is the only data item type that doesn't record its full
      length.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c851c495
    • A
      libceph: skip message if too big to receive · f759ebb9
      Alex Elder 提交于
      We know the length of our message buffers.  If we get a message
      that's too long, just dump it and ignore it.  If skip was set
      then con->in_msg won't be valid, so be careful not to dereference
      a null pointer in the process.
      
      This resolves:
          http://tracker.ceph.com/issues/4664Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f759ebb9
    • A
      libceph: fix possible CONFIG_BLOCK build problem · ea96571f
      Alex Elder 提交于
      This patch:
          15a0d7b libceph: record message data length
      did not enclose some bio-specific code inside CONFIG_BLOCK as
      it should have.  Fix that.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ea96571f
    • A
      libceph: kill off osd request r_data_in and r_data_out · 5476492f
      Alex Elder 提交于
      Finally!  Convert the osd op data pointers into real structures, and
      make the switch over to using them instead of having all ops share
      the in and/or out data structures in the osd request.
      
      Set up a new function to traverse the set of ops and release any
      data associated with them (pages).
      
      This and the patches leading up to it resolve:
          http://tracker.ceph.com/issues/4657Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5476492f
    • A
      libceph: set the data pointers when encoding ops · ec9123c5
      Alex Elder 提交于
      Still using the osd request r_data_in and r_data_out pointer, but
      we're basically only referring to it via the data pointers in the
      osd ops.  And we're transferring that information to the request
      or reply message only when the op indicates it's needed, in
      osd_req_encode_op().
      
      To avoid a forward reference, ceph_osdc_msg_data_set() was moved up
      in the file.
      
      Don't bother calling ceph_osd_data_init(), in ceph_osd_alloc(),
      because the ops array will already be zeroed anyway.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ec9123c5
    • A
      libceph: combine initializing and setting osd data · a4ce40a9
      Alex Elder 提交于
      This ends up being a rather large patch but what it's doing is
      somewhat straightforward.
      
      Basically, this is replacing two calls with one.  The first of the
      two calls is initializing a struct ceph_osd_data with data (either a
      page array, a page list, or a bio list); the second is setting an
      osd request op so it associates that data with one of the op's
      parameters.  In place of those two will be a single function that
      initializes the op directly.
      
      That means we sort of fan out a set of the needed functions:
          - extent ops with pages data
          - extent ops with pagelist data
          - extent ops with bio list data
      and
          - class ops with page data for receiving a response
      
      We also have define another one, but it's only used internally:
          - class ops with pagelist data for request parameters
      
      Note that we *still* haven't gotten rid of the osd request's
      r_data_in and r_data_out fields.  All the osd ops refer to them for
      their data.  For now, these data fields are pointers assigned to the
      appropriate r_data_* field when these new functions are called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a4ce40a9
    • A
      libceph: set message data when building osd request · 39b44cbe
      Alex Elder 提交于
      All calls of ceph_osdc_start_request() are preceded (in the case of
      rbd, almost) immediately by a call to ceph_osdc_build_request().
      
      Move the build calls at the top of ceph_osdc_start_request() out of
      there and into the ceph_osdc_build_request().  Nothing prevents
      moving these calls to the top of ceph_osdc_build_request(), either
      (and we're going to want them there in the next patch) so put them
      at the top.
      
      This and the next patch are related to:
          http://tracker.ceph.com/issues/4657Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      39b44cbe
    • A
      libceph: move ceph_osdc_build_request() · e65550fd
      Alex Elder 提交于
      This simply moves ceph_osdc_build_request() later in its source
      file without any change.  Done as a separate patch to facilitate
      review of the change in the next patch.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e65550fd
    • A
      libceph: format class info at init time · 5f562df5
      Alex Elder 提交于
      An object class method is formatted using a pagelist which contains
      the class name, the method name, and the data concatenated into an
      osd request's outbound data.
      
      Currently when a class op is initialized in osd_req_op_cls_init(),
      the lengths of and pointers to these three items are recorded.
      Later, when the op is getting formatted into the request message, a
      new pagelist is created and that is when these items get copied into
      the pagelist.
      
      This patch makes it so the pagelist to hold these items is created
      when the op is initialized instead.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5f562df5
    • A
      rbd: rearrange some code for consistency · 2169238d
      Alex Elder 提交于
      This patch just trivially moves around some code for consistency.
      
      In preparation for initializing osd request data fields in
      ceph_osdc_build_request(), I wanted to verify that rbd did in fact
      call that immediately before it called ceph_osdc_start_request().
      It was true (although image requests are built in a group and then
      started as a group).  But I made the changes here just to make
      it more obvious, by making all of the calls follow a common
      sequence:
      	osd_req_op_<optype>_init();
      	ceph_osd_data_<type>_init()
      	osd_req_op_<optype>_<datafield>()
      	rbd_osd_req_format()
      	...
      	ret = rbd_obj_request_submit()
      
      I moved the initialization of the callback for image object requests
      into rbd_img_request_fill_bio(), again, for consistency.  To avoid
      a forward reference, I moved the definition of rbd_img_obj_callback()
      up in the file.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2169238d
    • A
      rbd: separate initialization of osd data · 44cd188d
      Alex Elder 提交于
      The osd data for a request is currently initialized inside
      rbd_osd_req_create(), but that assumes an object request's data
      belongs in the osd request's data in or data out field.
      
      There are only three places where requests with data are set up, and
      it turns out it's easier to call just the osd data init routines
      directly there rather than handling it in rbd_osd_req_create().
      
      (The real motivation here is moving toward getting rid of the
      osd request in and out data fields.)
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      44cd188d
    • A
      rbd: don't set data in rbd_osd_req_format_op() · 2fa12320
      Alex Elder 提交于
      Currently an object request has its osd request's data field set in
      rbd_osd_req_format_op().  That assumes a single osd op per object
      request, and that won't be the case for long.
      
      Move the code that sets this out and into the caller.
      
      Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(),
      removing the notion that it's doing anything op-specific.
      
      This and the next patch resolve:
          http://tracker.ceph.com/issues/4658Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2fa12320
    • A
      libceph: specify osd op by index in request · c99d2d4a
      Alex Elder 提交于
      An osd request now holds all of its source op structures, and every
      place that initializes one of these is in fact initializing one
      of the entries in the the osd request's array.
      
      So rather than supplying the address of the op to initialize, have
      caller specify the osd request and an indication of which op it
      would like to initialize.  This better hides the details the
      op structure (and faciltates moving the data pointers they use).
      
      Since osd_req_op_init() is a common routine, and it's not used
      outside the osd client code, give it static scope.  Also make
      it return the address of the specified op (so all the other
      init routines don't have to repeat that code).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c99d2d4a
    • A
      libceph: add data pointers in osd op structures · 8c042b0d
      Alex Elder 提交于
      An extent type osd operation currently implies that there will
      be corresponding data supplied in the data portion of the request
      (for write) or response (for read) message.  Similarly, an osd class
      method operation implies a data item will be supplied to receive
      the response data from the operation.
      
      Add a ceph_osd_data pointer to each of those structures, and assign
      it to point to eithre the incoming or the outgoing data structure in
      the osd message.  The data is not always available when an op is
      initially set up, so add two new functions to allow setting them
      after the op has been initialized.
      
      Begin to make use of the data item pointer available in the osd
      operation rather than the request data in or out structure in
      places where it's convenient.  Add some assertions to verify
      pointers are always set the way they're expected to be.
      
      This is a sort of stepping stone toward really moving the data
      into the osd request ops, to allow for some validation before
      making that jump.
      
      This is the first in a series of patches that resolve:
          http://tracker.ceph.com/issues/4657Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8c042b0d
    • A
      libceph: rename data out field in osd request op · 54d50649
      Alex Elder 提交于
      There are fields "indata" and "indata_len" defined the ceph osd
      request op structure.  The "in" part is with from the point of view
      of the osd server, but is a little confusing here on the client
      side.  Change their names to use "request" instead of "in" to
      indicate that it defines data provided with the request (as opposed
      the data returned in the response).
      
      Rename the local variable in osd_req_encode_op() to match.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      54d50649
    • A
      libceph: keep source rather than message osd op array · 79528734
      Alex Elder 提交于
      An osd request keeps a pointer to the osd operations (ops) array
      that it builds in its request message.
      
      In order to allow each op in the array to have its own distinct
      data, we will need to keep track of each op's data, and that
      information does not go over the wire.
      
      As long as we're tracking the data we might as well just track the
      entire (source) op definition for each of the ops.  And if we're
      doing that, we'll have no more need to keep a pointer to the
      wire-encoded version.
      
      This patch makes the array of source ops be kept with the osd
      request structure, and uses that instead of the version encoded in
      the message in places where that was previously used.  The array
      will be embedded in the request structure, and the maximum number of
      ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
      to 2 to reduce the size of the structure.
      
      The result of doing this sort of ripples back up, and as a result
      various function parameters and local variables become unnecessary.
      
      Make r_num_ops be unsigned, and move the definition of struct
      ceph_osd_req_op earlier to ensure it's defined where needed.
      
      It does not yet add per-op data, that's coming soon.
      
      This resolves:
          http://tracker.ceph.com/issues/4656Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      79528734
    • A
      rbd: define rbd_osd_req_format_op() · 430c28c3
      Alex Elder 提交于
      Define rbd_osd_req_format_op(), which encapsulates formatting
      an osd op into an object request's osd request message.  Only
      one op is supported right now.
      
      Stop calling ceph_osdc_build_request() in rbd_osd_req_create().
      Instead, call rbd_osd_req_format_op() in each of the callers of
      rbd_osd_req_create().
      
      This is to prepare for the next patch, in which the source ops for
      an osd request will be held in the osd request itself.  Because of
      that, we won't have the source op to work with until after the
      request is created, so we can't format the op until then.
      
      This an the next patch resolve:
          http://tracker.ceph.com/issues/4656Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      430c28c3
    • A
      libceph: a few more osd data cleanups · 87060c10
      Alex Elder 提交于
      These are very small changes that make use osd_data local pointers
      as shorthands for structures being operated on.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      87060c10
    • A
      libceph: define ceph_osd_data_length() · 23c08a9c
      Alex Elder 提交于
      One more osd data helper, which returns the length of the
      data item, regardless of its type.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      23c08a9c
    • A
      libceph: define a few more helpers · c54d47bf
      Alex Elder 提交于
      Define ceph_osd_data_init() and ceph_osd_data_release() to clean up
      a little code.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c54d47bf
    • A
      libceph: define osd data initialization helpers · 43bfe5de
      Alex Elder 提交于
      Define and use functions that encapsulate the initializion of a
      ceph_osd_data structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      43bfe5de
    • A
      libceph: compute incoming bytes once · 9fc6e064
      Alex Elder 提交于
      This is a simple change, extracting the number of incoming data
      bytes just once in handle_reply().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9fc6e064
    • A
      rbd: define inbound data size for method ops · 6010a451
      Alex Elder 提交于
      When rbd creates an object request containing an object method call
      operation it is passing 0 for the size.  I originally thought this
      was because the length was not needed for method calls, but I think
      it really should be supplied, to describe how much space is
      available to receive response data.  So provide the supplied length.
      
      This resolves:
          http://tracker.ceph.com/issues/4659Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6010a451
    • A
      libceph: provide data length when preparing message · 98fa5dd8
      Alex Elder 提交于
      In prepare_message_data(), the length used to initialize the cursor
      is taken from the header of the message provided.  I'm working
      toward not using the header data length field to determine length in
      outbound messages, and this is a step in that direction.  For
      inbound messages this will be set to be the actual number of bytes
      that are arriving (which may be less than the total size of the data
      buffer available).
      
      This resolves:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      98fa5dd8
    • A
      ceph: build osd request message later for writepages · e5975c7c
      Alex Elder 提交于
      Hold off building the osd request message in ceph_writepages_start()
      until just before it will be submitted to the osd client for
      execution.
      
      We'll still create the request and allocate the page pointer array
      after we learn we have at least one page to write.  A local variable
      will be used to keep track of the allocated array of pages.  Wait
      until just before submitting the request for assigning that page
      array pointer to the request message.
      
      Create ands use a new function osd_req_op_extent_update() whose
      purpose is to serve this one spot where the length value supplied
      when an osd request's op was initially formatted might need to get
      changed (reduced, never increased) before submitting the request.
      
      Previously, ceph_writepages_start() assigned the message header's
      data length because of this update.  That's no longer necessary,
      because ceph_osdc_build_request() will recalculate the right
      value to use based on the content of the ops in the request.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e5975c7c
    • A
      libceph: hold off building osd request · 02ee07d3
      Alex Elder 提交于
      Defer building the osd request until just before submitting it in
      all callers except ceph_writepages_start().  (That caller will be
      handed in the next patch.)
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      02ee07d3
    • A
      ceph: kill ceph alloc_page_vec() · 88486957
      Alex Elder 提交于
      There is a helper function alloc_page_vec() that, despite its
      generic sounding name depends heavily on an osd request structure
      being populated with certain information.
      
      There is only one place this function is used, and it ends up
      being a bit simpler to just open code what it does, so get
      rid of the helper.
      
      The real motivation for this is deferring building the of the osd
      request message, and this is a step in that direction.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      88486957
    • A
      ceph: define ceph_writepages_osd_request() · 94fe8420
      Alex Elder 提交于
      Mostly for readability, define ceph_writepages_osd_request() and
      use it to allocate the osd request for ceph_writepages_start().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      94fe8420