1. 02 5月, 2013 33 次提交
    • A
      rbd: adjust image object request ref counting · b155e86c
      Alex Elder 提交于
      An extra reference is taken when an object request is added as one
      of the requests making up an image object.  A reference is dropped
      again when the image's object requests get submitted.
      
      The original reference for the object request will remain throughout
      this period, so we don't need to add and then take away an extra
      one.
      
      This can be interpreted as the image request inheriting the original
      object request's reference.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b155e86c
    • A
      libceph: kill off osd data write_request parameters · 406e2c9f
      Alex Elder 提交于
      In the incremental move toward supporting distinct data items in an
      osd request some of the functions had "write_request" parameters to
      indicate, basically, whether the data belonged to in_data or the
      out_data.  Now that we maintain the data fields in the op structure
      there is no need to indicate the direction, so get rid of the
      "write_request" parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      406e2c9f
    • A
      rbd: implement layered reads · 8b3e1a56
      Alex Elder 提交于
      Implement layered read requests for format 2 rbd images.
      
      If an rbd image is a clone of a snapshot, the snapshot will be the
      clone's "parent" image.  When an object read request on a clone
      comes back with ENOENT it indicates that the clone is not yet
      populated with that portion of the image's data, and the parent
      image should be consulted to satisfy the read.
      
      When this occurs, a new image request is created, directed to the
      parent image.  The offset and length of the image are the same as
      the image-relative offset and length of the object request that
      produced ENOENT.  Data from the parent image therefore satisfies the
      object read request for the original image request.
      
      While this code works, it will not be active until we enable the
      layering feature (by adding RBD_FEATURE_LAYERING to the value of
      RBD_FEATURES_SUPPORTED).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8b3e1a56
    • A
      rbd: probe the parent of an image if present · 2f82ee54
      Alex Elder 提交于
      Call the probe function for the parent device if one is present.
      Since we don't formally support the layering feature we won't
      be using this functionality just yet.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2f82ee54
    • A
      rbd: add an object request flag for image data objects · 6365d33a
      Alex Elder 提交于
      Add a flag to distinguish between object requests being done on
      standalone objects and requests being sent for objects representing
      rbd image data (i.e., object requests that are the result of image
      request).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6365d33a
    • A
      rbd: define an rbd object request flags field · 926f9b3f
      Alex Elder 提交于
      We're going to need some more Boolean values for object requests,
      so create a flags bit field and use it to record whether the request
      is done.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      926f9b3f
    • A
      rbd: encapsulate image object end request handling · 1217857f
      Alex Elder 提交于
      Encapsulate the code that completes processing of an object request
      that's part of an image request.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      1217857f
    • A
      rbd: define image request layered flag · d0b2e944
      Alex Elder 提交于
      Define a flag indicating whether an image request is for a layered
      image (one with a parent image to which requests will be redirected
      if the target object of a request does not exist).  The code that
      checks this flag will be added shortly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d0b2e944
    • A
      rbd: define image request originator flag · 9849e986
      Alex Elder 提交于
      Define a flag indicating whether an image request originated from
      the Linux block layer (from blk_fetch_request()) or whether it was
      initiated in order to satisfy an object request for a child image
      of a layered rbd device.  For image requests initiated by objects of
      child images we'll save a pointer to the object request rather than
      the Linux block request.
      
      For now, only block requests are used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9849e986
    • A
      rbd: define image request flags · 0c425248
      Alex Elder 提交于
      There are several Boolean values we'll be maintaining for image
      requests.  Switch from the single write_request field to a
      general-purpose flags field, and use one if its bits to represent
      the direction of I/O for the image request.  Define helper functions
      for setting and testing that flag.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0c425248
    • A
      rbd: record image-relative offset in object requests · 7da22d29
      Alex Elder 提交于
      For an image object request we will need to know what offset within
      the rbd image the request covers.  Record that when the object
      request gets created.
      
      Update the I/O error warnings so they use this so what's reported
      is more informative.
      
      Rename a local variable to fit the convention used everywhere else.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      7da22d29
    • A
      rbd: record aggregate image transfer count · 55f27e09
      Alex Elder 提交于
      Compute the total number of bytes transferred for an image
      request--the sum across each of the request's object requests.
      To avoid contention do it only when all object requests are
      complete, in rbd_img_request_complete().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      55f27e09
    • A
      rbd: record overall image request result · a5a337d4
      Alex Elder 提交于
      If any image object request produces a non-zero result, preserve
      that as the result of the overall image request.  If multiple
      objects have non-zero results, save only the first one.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a5a337d4
    • A
      rbd: update feature bits · 5cbf6f12
      Alex Elder 提交于
      There is a new rbd feature bit defined for "fancy striping." Add
      it to the ones defined in the kernel client.
      
      Change RBD_FEATURES_ALL so it represents the set of all feature
      bits (rather than just the ones we support).  Define a new symbol
      RBD_FEATURES_SUPPORTED to indicate the supported ones.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5cbf6f12
    • A
      libceph: make method call data be a separate data item · 04017e29
      Alex Elder 提交于
      Right now the data for a method call is specified via a pointer and
      length, and it's copied--along with the class and method name--into
      a pagelist data item to be sent to the osd.  Instead, encode the
      data in a data item separate from the class and method names.
      
      This will allow large amounts of data to be supplied to methods
      without copying.  Only rbd uses the class functionality right now,
      and when it really needs this it will probably need to use a page
      array rather than a page list.  But this simple implementation
      demonstrates the functionality on the osd client, and that's enough
      for now.
      
      This resolves:
          http://tracker.ceph.com/issues/4104Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      04017e29
    • A
      libceph: combine initializing and setting osd data · a4ce40a9
      Alex Elder 提交于
      This ends up being a rather large patch but what it's doing is
      somewhat straightforward.
      
      Basically, this is replacing two calls with one.  The first of the
      two calls is initializing a struct ceph_osd_data with data (either a
      page array, a page list, or a bio list); the second is setting an
      osd request op so it associates that data with one of the op's
      parameters.  In place of those two will be a single function that
      initializes the op directly.
      
      That means we sort of fan out a set of the needed functions:
          - extent ops with pages data
          - extent ops with pagelist data
          - extent ops with bio list data
      and
          - class ops with page data for receiving a response
      
      We also have define another one, but it's only used internally:
          - class ops with pagelist data for request parameters
      
      Note that we *still* haven't gotten rid of the osd request's
      r_data_in and r_data_out fields.  All the osd ops refer to them for
      their data.  For now, these data fields are pointers assigned to the
      appropriate r_data_* field when these new functions are called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a4ce40a9
    • A
      rbd: rearrange some code for consistency · 2169238d
      Alex Elder 提交于
      This patch just trivially moves around some code for consistency.
      
      In preparation for initializing osd request data fields in
      ceph_osdc_build_request(), I wanted to verify that rbd did in fact
      call that immediately before it called ceph_osdc_start_request().
      It was true (although image requests are built in a group and then
      started as a group).  But I made the changes here just to make
      it more obvious, by making all of the calls follow a common
      sequence:
      	osd_req_op_<optype>_init();
      	ceph_osd_data_<type>_init()
      	osd_req_op_<optype>_<datafield>()
      	rbd_osd_req_format()
      	...
      	ret = rbd_obj_request_submit()
      
      I moved the initialization of the callback for image object requests
      into rbd_img_request_fill_bio(), again, for consistency.  To avoid
      a forward reference, I moved the definition of rbd_img_obj_callback()
      up in the file.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2169238d
    • A
      rbd: separate initialization of osd data · 44cd188d
      Alex Elder 提交于
      The osd data for a request is currently initialized inside
      rbd_osd_req_create(), but that assumes an object request's data
      belongs in the osd request's data in or data out field.
      
      There are only three places where requests with data are set up, and
      it turns out it's easier to call just the osd data init routines
      directly there rather than handling it in rbd_osd_req_create().
      
      (The real motivation here is moving toward getting rid of the
      osd request in and out data fields.)
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      44cd188d
    • A
      rbd: don't set data in rbd_osd_req_format_op() · 2fa12320
      Alex Elder 提交于
      Currently an object request has its osd request's data field set in
      rbd_osd_req_format_op().  That assumes a single osd op per object
      request, and that won't be the case for long.
      
      Move the code that sets this out and into the caller.
      
      Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(),
      removing the notion that it's doing anything op-specific.
      
      This and the next patch resolve:
          http://tracker.ceph.com/issues/4658Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2fa12320
    • A
      libceph: specify osd op by index in request · c99d2d4a
      Alex Elder 提交于
      An osd request now holds all of its source op structures, and every
      place that initializes one of these is in fact initializing one
      of the entries in the the osd request's array.
      
      So rather than supplying the address of the op to initialize, have
      caller specify the osd request and an indication of which op it
      would like to initialize.  This better hides the details the
      op structure (and faciltates moving the data pointers they use).
      
      Since osd_req_op_init() is a common routine, and it's not used
      outside the osd client code, give it static scope.  Also make
      it return the address of the specified op (so all the other
      init routines don't have to repeat that code).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c99d2d4a
    • A
      libceph: add data pointers in osd op structures · 8c042b0d
      Alex Elder 提交于
      An extent type osd operation currently implies that there will
      be corresponding data supplied in the data portion of the request
      (for write) or response (for read) message.  Similarly, an osd class
      method operation implies a data item will be supplied to receive
      the response data from the operation.
      
      Add a ceph_osd_data pointer to each of those structures, and assign
      it to point to eithre the incoming or the outgoing data structure in
      the osd message.  The data is not always available when an op is
      initially set up, so add two new functions to allow setting them
      after the op has been initialized.
      
      Begin to make use of the data item pointer available in the osd
      operation rather than the request data in or out structure in
      places where it's convenient.  Add some assertions to verify
      pointers are always set the way they're expected to be.
      
      This is a sort of stepping stone toward really moving the data
      into the osd request ops, to allow for some validation before
      making that jump.
      
      This is the first in a series of patches that resolve:
          http://tracker.ceph.com/issues/4657Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8c042b0d
    • A
      libceph: keep source rather than message osd op array · 79528734
      Alex Elder 提交于
      An osd request keeps a pointer to the osd operations (ops) array
      that it builds in its request message.
      
      In order to allow each op in the array to have its own distinct
      data, we will need to keep track of each op's data, and that
      information does not go over the wire.
      
      As long as we're tracking the data we might as well just track the
      entire (source) op definition for each of the ops.  And if we're
      doing that, we'll have no more need to keep a pointer to the
      wire-encoded version.
      
      This patch makes the array of source ops be kept with the osd
      request structure, and uses that instead of the version encoded in
      the message in places where that was previously used.  The array
      will be embedded in the request structure, and the maximum number of
      ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
      to 2 to reduce the size of the structure.
      
      The result of doing this sort of ripples back up, and as a result
      various function parameters and local variables become unnecessary.
      
      Make r_num_ops be unsigned, and move the definition of struct
      ceph_osd_req_op earlier to ensure it's defined where needed.
      
      It does not yet add per-op data, that's coming soon.
      
      This resolves:
          http://tracker.ceph.com/issues/4656Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      79528734
    • A
      rbd: define rbd_osd_req_format_op() · 430c28c3
      Alex Elder 提交于
      Define rbd_osd_req_format_op(), which encapsulates formatting
      an osd op into an object request's osd request message.  Only
      one op is supported right now.
      
      Stop calling ceph_osdc_build_request() in rbd_osd_req_create().
      Instead, call rbd_osd_req_format_op() in each of the callers of
      rbd_osd_req_create().
      
      This is to prepare for the next patch, in which the source ops for
      an osd request will be held in the osd request itself.  Because of
      that, we won't have the source op to work with until after the
      request is created, so we can't format the op until then.
      
      This an the next patch resolve:
          http://tracker.ceph.com/issues/4656Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      430c28c3
    • A
      libceph: define osd data initialization helpers · 43bfe5de
      Alex Elder 提交于
      Define and use functions that encapsulate the initializion of a
      ceph_osd_data structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      43bfe5de
    • A
      rbd: define inbound data size for method ops · 6010a451
      Alex Elder 提交于
      When rbd creates an object request containing an object method call
      operation it is passing 0 for the size.  I originally thought this
      was because the length was not needed for method calls, but I think
      it really should be supplied, to describe how much space is
      available to receive response data.  So provide the supplied length.
      
      This resolves:
          http://tracker.ceph.com/issues/4659Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6010a451
    • A
      libceph: record length of bio list with bio · fdce58cc
      Alex Elder 提交于
      When assigning a bio pointer to an osd request, we don't have an
      efficient way of knowing the total length bytes in the bio list.
      That information is available at the point it's set up by the rbd
      code, so record it with the osd data when it's set.
      
      This and the next patch are related to maintaining the length of a
      message's data independent of the message header, as described here:
          http://tracker.ceph.com/issues/4589Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      fdce58cc
    • A
      libceph: define source request op functions · 33803f33
      Alex Elder 提交于
      The rbd code has a function that allocates and populates a
      ceph_osd_req_op structure (the in-core version of an osd request
      operation).  When reviewed, Josh suggested two things: that the
      big varargs function might be better split into type-specific
      functions; and that this functionality really belongs in the osd
      client rather than rbd.
      
      This patch implements both of Josh's suggestions.  It breaks
      up the rbd function into separate functions and defines them
      in the osd client module as exported interfaces.  Unlike the
      rbd version, however, the functions don't allocate an osd_req_op
      structure; they are provided the address of one and that is
      initialized instead.
      
      The rbd function has been eliminated and calls to it have been
      replaced by calls to the new routines.  The rbd code now now use a
      stack (struct) variable to hold the op rather than allocating and
      freeing it each time.
      
      For now only the capabilities used by rbd are implemented.
      Implementing all the other osd op types, and making the rest of the
      code use it will be done separately, in the next few patches.
      
      Note that only the extent, cls, and watch portions of the
      ceph_osd_req_op structure are currently used.  Delete the others
      (xattr, pgls, and snap) from its definition so nobody thinks it's
      actually implemented or needed.  We can add it back again later
      if needed, when we know it's been tested.
      
      This (and a few follow-on patches) resolves:
          http://tracker.ceph.com/issues/3861Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      33803f33
    • A
      ceph: move max constant definitions · adfe695a
      Alex Elder 提交于
      Move some definitions for max integer values out of the rbd code and
      into the more central "decode.h" header file.  These really belong
      in a Linux (or libc) header somewhere, but I haven't gotten around
      to proposing that yet.
      
      This is in preparation for moving some code out of rbd.c and into
      the osd client.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      adfe695a
    • A
      libceph: let osd ops determine request data length · 175face2
      Alex Elder 提交于
      The length of outgoing data in an osd request is dependent on the
      osd ops that are embedded in that request.  Each op is encoded into
      a request message using osd_req_encode_op(), so that should be used
      to determine the amount of outgoing data implied by the op as it
      is encoded.
      
      Have osd_req_encode_op() return the number of bytes of outgoing data
      implied by the op being encoded, and accumulate and use that in
      ceph_osdc_build_request().
      
      As a result, ceph_osdc_build_request() no longer requires its "len"
      parameter, so get rid of it.
      
      Using the sum of the op lengths rather than the length provided is
      a valid change because:
          - The only callers of osd ceph_osdc_build_request() are
            rbd and the osd client (in ceph_osdc_new_request() on
            behalf of the file system).
          - When rbd calls it, the length provided is only non-zero for
            write requests, and in that case the single op has the
            same length value as what was passed here.
          - When called from ceph_osdc_new_request(), (it's not all that
            easy to see, but) the length passed is also always the same
            as the extent length encoded in its (single) write op if
            present.
      
      This resolves:
          http://tracker.ceph.com/issues/4406Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      175face2
    • A
      libceph: record byte count not page count · e0c59487
      Alex Elder 提交于
      Record the byte count for an osd request rather than the page count.
      The number of pages can always be derived from the byte count (and
      alignment/offset) but the reverse is not true.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e0c59487
    • A
      libceph: separate read and write data · 0fff87ec
      Alex Elder 提交于
      An osd request defines information about where data to be read
      should be placed as well as where data to write comes from.
      Currently these are represented by common fields.
      
      Keep information about data for writing separate from data to be
      read by splitting these into data_in and data_out fields.
      
      This is the key patch in this whole series, in that it actually
      identifies which osd requests generate outgoing data and which
      generate incoming data.  It's less obvious (currently) that an osd
      CALL op generates both outgoing and incoming data; that's the focus
      of some upcoming work.
      
      This resolves:
          http://tracker.ceph.com/issues/4127Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0fff87ec
    • A
      libceph: distinguish page and bio requests · 2ac2b7a6
      Alex Elder 提交于
      An osd request uses either pages or a bio list for its data.  Use a
      union to record information about the two, and add a data type
      tag to select between them.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2ac2b7a6
    • A
      libceph: separate osd request data info · 2794a82a
      Alex Elder 提交于
      Pull the fields in an osd request structure that define the data for
      the request out into a separate structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2794a82a
  2. 18 4月, 2013 1 次提交
  3. 30 3月, 2013 1 次提交
    • A
      rbd: don't zero-fill non-image object requests · 6e2a4505
      Alex Elder 提交于
      A result of ENOENT from a read request for an object that's part of
      an rbd image indicates that there is a hole in that portion of the
      image.  Similarly, a short read for such an object indicates that
      the remainder of the read should be interpreted a full read with
      zeros filling out the end of the request.
      
      This behavior is not correct for objects that are not backing rbd
      image data.  Currently rbd_img_obj_request_callback() assumes it
      should be done for all objects.
      
      Change rbd_img_obj_request_callback() so it only does this zeroing
      for image objects.  Encapsulate that special handling in its own
      function.  Add an assertion that the image object request is a bio
      request, since we assume that (and we currently don't support any
      other types).
      
      This resolves a problem identified here:
          http://tracker.ceph.com/issues/4559
      
      The regression was introduced by bf0d5f50.
      Reported-by: NDan van der Ster <dan@vanderster.com>
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-off-by: NSage Weil <sage@inktank.com>
      6e2a4505
  4. 27 2月, 2013 3 次提交
    • S
      libceph: update osd request/reply encoding · 1b83bef2
      Sage Weil 提交于
      Use the new version of the encoding for osd requests and replies.  In the
      process, update the way we are tracking request ops and reply lengths and
      results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
      appropriately.
      
      The main changes are:
       - we keep pointers into the request memory for fields we need to update
         each time the request is sent out over the wire
       - we keep information about the result in an array in the request struct
         where the users can easily get at it.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1b83bef2
    • A
      rbd: pass length, not op for osd completions · c47f9371
      Alex Elder 提交于
      The only thing type-specific osd completion functions do with their
      osd op parameter is (in some cases) extract the number of bytes
      transferred from it.  In the other cases, the xferred bytes field
      is not used, and total message data transfer byte count (which may
      well be zero) is used.
      
      Just set the object request transfer count in the main osd request
      callback function and provide that to the other routines.  There is
      then no longer any need to pass the op pointer to the type-specific
      completion routines, so drop those parameters.
      
      Stop doing anything with the total message data length.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      c47f9371
    • A
      rbd: move rbd_osd_trivial_callback() · 39bf2c5d
      Alex Elder 提交于
      This function is slightly out of place, probably the result
      of an errant automatic merge or something.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      39bf2c5d
  5. 26 2月, 2013 2 次提交