1. 02 5月, 2013 40 次提交
    • A
      rbd: avoid dropping extra reference in rbd_free_disk() · a0cab924
      Alex Elder 提交于
      I found during some failure injection testing that the call to
      rbd_free_disk() in the error path of rbd_dev_probe_finish() was
      dropping an extra reference to the disk queue.  The problem
      occurred when put_disk tried to drop a reference to the disk's
      queue.  A call to blk_cleanup_queue() just prior to that will have
      also dropped a reference to the queue.
      
      The problem is that the reference dropped by put_disk() is assumed
      to have been taken by add_disk().  Our code has error paths that can
      occur after the disk and its queue are initialized, but before the
      call to add_disk(), and in those paths we won't have that extra
      reference.
      
      The fix is easy though.  In rbd_free_disk() we're already checking
      the disk's GENHD_FL_UP flag.  That flag is an indication that
      add_disk() has been called, so just call blk_cleanup_queue()
      conditional on that flag being set.
      
      This resolves:
          http://tracker.ceph.com/issues/4800Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a0cab924
    • A
      rbd: use rbd_obj_method_sync() return value · f40eb349
      Alex Elder 提交于
      Now that rbd_obj_method_sync() returns the number of bytes
      returned by the method call, that value should be used by
      callers to ensure we don't overrun the valid portion of the
      buffer.
      
      Fix the two spots that remained that weren't doing that,
      rbd_dev_image_name() and rbd_dev_v2_snap_name().
      
      Rearrange the error path slightly in rbd_dev_v2_snap_name().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f40eb349
    • A
      rbd: fix leak of format 2 snapshot names · 6e584f52
      Alex Elder 提交于
      When the snapshot context for an rbd device gets updated (or the
      initial one is recorded) a a list of snapshot structures is created
      to represent them, one entry per snapshot.  Each entry includes a
      dynamically-allocated copy of the snapshot name.
      
      Currently the name is allocated in rbd_snap_create(), as a duplicate
      of the passed-in name.
      
      For format 1 images, the snapshot name provided is just a pointer to
      an existing name.  But for format 2 images, the passed-in name is
      already dynamically allocated, and in the the process of duplicating
      it here we are leaking the passed-in name.
      
      Fix this by dynamically allocating the name for format 1 snapshots
      also, and then stop allocating a duplicate in rbd_snap_create().
      
      Change rbd_dev_v1_snap_info() so none of its parameters is
      side-effected unless it's going to return success.
      
      This is part of:
          http://tracker.ceph.com/issues/4803Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6e584f52
    • A
      rbd: rename __rbd_add_snap_dev() · 6087b51b
      Alex Elder 提交于
      Rename __rbd_add_snap_dev() to be rbd_snap_create().  We no longer
      have devices for non-mapped snapshots, and we're not actually
      "adding" it to the list in this function, just creating it.
      
      Rename rbd_remove_snap_dev() to be rbd_snap_destroy() for reasons
      similar to the above.  Stop having this function delete the snapshot
      from its list (to be symmetrical with its create counterpart) and do
      that in the caller instead.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6087b51b
    • A
      rbd: only update values on snap_info success · acb1b6ca
      Alex Elder 提交于
      Change rbd_dev_v2_snap_info() so it only ever sets values of the
      size and features parameters if looking up the snapshot name was
      successful.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      acb1b6ca
    • A
      rbd: make snap_size order parameter optional · c86f86e9
      Alex Elder 提交于
      Only one of the two callers of _rbd_dev_v2_snap_size() needs the
      order value returned.  So make that an optional argument--a null
      pointer if the caller doesn't need it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c86f86e9
    • A
      rbd: fix leak of snapshots during initial probe · 522a0cc0
      Alex Elder 提交于
      When an rbd image is initially mapped, its snapshot context is
      collected, and then a list of snapshot entries representing the
      snapshots in that context is created.  The list is created using
      rbd_dev_snaps_update().  (This function also supports updating an
      existing snapshot list based on a new snapshot context.)
      
      If an error occurs, updating the list is aborted, and the list is
      currently left as-is, in an inconsistent state.  At that point,
      there may be a partially-constructed list, but the calling functions
      (rbd_dev_probe_finish() from rbd_dev_probe() from rbd_add()) never
      clean them up.  So this constitutes a leak.
      
      A snapshot list that is inconsistent with the current snapshot
      context is of no use, and might even be actively bad.  So rather
      than just having the caller clean it up, have rbd_dev_snaps_update()
      just clear out the entire snapshot list in the event an error
      occurs.
      
      The other place rbd_dev_snaps_update() is used is when a refresh is
      triggered, either because of a watch callback or via a write to the
      /sys/bus/rbd/devices/<id>/refresh interface.  An error while
      updating the snapshots has no substantive effect in either of those
      cases, but one of them issues a warning.  Move that warning to the
      common rbd_dev_refresh() function so it gets issued regardless of
      how it got initiated.
      
      This is part of:
          http://tracker.ceph.com/issues/4803Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      522a0cc0
    • A
      rbd: don't create sysfs entries for non-mapped snapshots · 3e83b65b
      Alex Elder 提交于
      When an rbd image gets mapped a device entry gets created for it
      under /sys/bus/rbd/devices/<id>/.  Inside that directory there are
      sysfs files that contain information about the image: its size,
      feature bits, major device number, and so on.
      
      Additionally, if that image has any snapshots, a device entry gets
      created for each of those as a "child" of the mapped device.  Each
      of these is a subdirectory of the mapped device, and each directory
      contains a few files with information about the snapshot (its
      snapshot id, size, and feature mask).
      
      There is no clear benefit to having those device entries for the
      snapshots.  The information provided via sysfs of of little real
      value--and all of it is available via rbd CLI commands.  If we
      still wanted to see the kernel's view of this information it could
      be done much more simply by including it in a single sysfs file for
      the mapped image.
      
      But there *is* a clear cost to supporting them.  Every time a snapshot
      context changes, these entries need to be updated (deleted snapshots
      removed, new snapshots created).  The rbd driver is notified of
      changes to the snapshot context via callbacks from an osd, and care
      must be taken to coordinate removal of snapshot data structures
      with the possibility of one these notifications occurring.
      
      Things would be considerably simpler if we just didn't have to
      maintain device entries for the snapshots.
      
      So get rid of them.
      
      The ability to map a snapshot of an rbd image will remain; the only
      thing lost will be the ability to query these sysfs directories for
      information about snapshots of mapped images.
      
      This resolves:
          http://tracker.ceph.com/issues/4796Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      3e83b65b
    • A
      rbd: activate support for layered images · 770eba6e
      Alex Elder 提交于
      Now that we have most everything in place to support layered rbd
      images, enable support for them in the kernel client.  Issue a
      warning to the log that the support is considered experimental
      whenever a format 2 layered image is mapped.
      
      Note that we also have to claim to support the STRIPINGV2 feature,
      due to a mistake in the way the rbd CLI set up those flags.  This
      feature can work if it has the right parameters, and safeguards
      have been put in place to reject those images that do not have
      compatible parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      770eba6e
    • A
      rbd: get and check striping parameters · cc070d59
      Alex Elder 提交于
      If an rbd format 2 image indicates it supports the STRIPINGV2
      feature we need to find out its stripe unit and stripe count in
      order to know whether we can use it.  We don't yet support fancy
      striping fully, but if the default parameters are used the behavior
      is indistinguishible from non-fancy striping.
      
      This is necessary because some images require the STRIPINGV2 feature
      even if they use the default parameters.  (Which is to say the feature
      bit was erroneously set even if the feature was not used.)
      
      This resolves:
          http://tracker.ceph.com/issues/4709Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      cc070d59
    • A
      rbd: have rbd_obj_method_sync() return transfer count · 57385b51
      Alex Elder 提交于
      Callers of rbd_obj_method_sync() don't know how many bytes of data
      got returned by the class method call.  As a result, they have been
      assuming enough got returned to decode whatever was expected.
      
      This isn't safe.  We know how many bytes got transferred, so have
      rbd_obj_method_sync() return that amount (rather than just 0) if
      the call is successful.
      
      Change all callers to use this return value to ensure decoding of
      the results is done safely.
      
      On the other hand, most callers of rbd_obj_method_sync() only
      indicate success or failure, so all of *their* callers can simply
      test for non-zero result.
      
      This resolves:
          http://tracker.ceph.com/issues/4773Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      57385b51
    • A
      rbd: void data pointers for rbd_obj_method_sync() · 4157976b
      Alex Elder 提交于
      Make the inbound and outbound data parameters have void rather than
      character type for rbd_obj_method_sync().  This makes it more clear
      they don't expect typed data, and eliminates the need for some silly
      type casts.
      
      One more unrelated change: define the features buffer used in
      _rbd_dev_v2_snap_features() to be a packed data structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      4157976b
    • A
      rbd: give rbd_obj_read_sync() buffer void type · 80ef15bf
      Alex Elder 提交于
      Make the buf parameter into which the data is to be read have type
      void pointer.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      80ef15bf
    • A
      rbd: enforce parent overlap · a9e8ba2c
      Alex Elder 提交于
      A clone image has a defined overlap point with its parent image.
      That is the byte offset beyond which the parent image has no
      defined data to back the clone, and anything thereafter can be
      viewed as being zero-filled by the clone image.
      
      This is needed because a clone image can be resized.  If it gets
      resized larger than the snapshot it is based on, the overlap defines
      the original size.  If the clone gets resized downward below the
      original size the new clone size defines the overlap.  If the clone
      is subsequently resized to be larger, the overlap won't be increased
      because the previous resize invalidated any parent data beyond that
      point.
      
      This resolves:
          http://tracker.ceph.com/issues/4724Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a9e8ba2c
    • A
      rbd: issue a copyup for layered writes · 0eefd470
      Alex Elder 提交于
      This implements the main copyup functionality for layered writes.
      
      Here we add a copyup_pages field to the object request, which is
      used only for copyup requests to keep track of the page array
      containing data read from the parent image.
      
      A copyup request is currently the only request rbd has that requires
      two osd operations.  Because of this we handle copyup specially.
      All image object requests get an osd request allocated when they are
      created.  For a write request, if a copyup is required, the osd
      request originally allocated is released, and a new one (with room
      for two osd ops) is allocated to replace it.  A new function
      rbd_osd_req_create_copyup() allocates an osd request suitable for
      a copyup request.
      
      The first op is then filled with a copyup object class method call,
      supplying the array of pages containing data read from the parent.
      The second op is filled in with the original write request.
      
      The original request otherwise remains intact, and it describes the
      original write request (found in the second osd op).  The presence
      of the copyup op is sort of implicit; a non-null copyup_pages field
      could be used to distinguish between a "normal" write request and a
      request containing both a copyup call and a write.
      
      This resolves:
          http://tracker.ceph.com/issues/3419Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0eefd470
    • A
      rbd: implement full object parent reads · 3d7efd18
      Alex Elder 提交于
      As a step toward implementing layered writes, implement reading the
      data for a target object from the parent image for a write request
      whose target object is known to not exist.  Add a copyup_pages field
      to an image request to track the page array used (only) for such a
      request.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      3d7efd18
    • L
      rbd: revalidate_disk upon rbd resize · d98df63e
      Laurent Barbe 提交于
      If rbd disk is open and rbd resize is done, new size is not
      visible by filesystem.  Like is done in virtio-blk and dm driver,
      revalidate_disk() permits to update the bd_inode size.
      Signed-off-by: NLaurent Barbe <laurent@ksperis.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      d98df63e
    • A
      rbd: support page array image requests · f1a4739f
      Alex Elder 提交于
      This patch adds the ability to build an image request whose data
      will be written from or read into memory described by a page array.
      (Previously only bio lists were supported.)
      
      Originally this was going to define a new function for this purpose
      but it was largely identical to the rbd_img_request_fill_bio().  So
      instead, rbd_img_request_fill_bio() has been generalized to handle
      both types of image request.
      
      For the moment we still only fill image requests with bio data.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      f1a4739f
    • A
      rbd: define zero_pages() · b9434c5b
      Alex Elder 提交于
      Define a new function zero_pages() that zeroes a range of memory
      defined by a page array, along the lines of zero_bio_chain().  It
      saves and the irq flags like bvec_kmap_irq() does, though I'm not
      sure at this point that it's necessary.
      
      Update rbd_img_obj_request_read_callback() to use the new function
      if the object request contains page rather than bio data.
      
      For the moment, only bio data is used for osd READ ops.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b9434c5b
    • A
      rbd: encapsulate submission of image object requests · b454e36d
      Alex Elder 提交于
      Object requests that are part of an image request are subject to
      some additional handling.  Define rbd_img_obj_request_submit() to
      encapsulate that, and use it when initially submitting an image
      object request, and when re-submitting it during callback of
      an object existence check.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b454e36d
    • A
      rbd: define separate read and write format funcs · 9d4df01f
      Alex Elder 提交于
      Separate rbd_osd_req_format() into two functions, one for read
      requests and the other for write requests.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9d4df01f
    • A
      rbd: issue stat request before layered write · c5b5ef6c
      Alex Elder 提交于
      This is a step toward fully implementing layered writes.
      
      Add checks before request submission for the object(s) associated
      with an image request.  For write requests, if we don't know that
      the target object exists, issue a STAT request to find out.  When
      that request completes, mark the known and exists flags for the
      original object request accordingly and re-submit the object
      request.  (Note that this still does the existence check only; the
      copyup operation is not yet done.)
      
      A new object request is created to perform the existence check.  A
      pointer to the original request is added to that object request to
      allow the stat request to re-issue the original request after
      updating its flags.  If there is a failure with the stat request
      the error code is stored with the original request, which is then
      completed.
      
      This resolves:
          http://tracker.ceph.com/issues/3418Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c5b5ef6c
    • A
      rbd: add target object existence flags · 5679c59f
      Alex Elder 提交于
      This creates two new flags for object requests to indicate what is
      known about the existence of the object to which a request is to be
      sent.  The KNOWN flag will be true if the the EXISTS flag is
      meaningful.  That is:
      
          KNOWN   EXISTS
          -----   ------
            0       0     don't know whether the object exists
            0       1     (not used/invalid)
            1       0     object is known to not exist
            1       0     object is known to exist
      
      This will be used in determining how to handle write requests for
      data objects for layered rbd images.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5679c59f
    • A
      rbd: always check IMG_DATA flag · 57acbaa7
      Alex Elder 提交于
      In a few spots, whether the an object request's img_request pointer
      is null is used to determine whether an object request is being done
      as part of an image data request.
      
      Stop doing that, and instead always use the object request IMG_DATA
      flag for that purpose.  Swap the order of the definition of the
      IMG_DATA and DONE flag helpers, because obj_request_done_set() now
      refers to obj_request_img_data_set() to get its rbd_dev value.
      
      This will become important because the img_request pointer is
      about to become part of a union.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      57acbaa7
    • A
      rbd: adjust image object request ref counting · b155e86c
      Alex Elder 提交于
      An extra reference is taken when an object request is added as one
      of the requests making up an image object.  A reference is dropped
      again when the image's object requests get submitted.
      
      The original reference for the object request will remain throughout
      this period, so we don't need to add and then take away an extra
      one.
      
      This can be interpreted as the image request inheriting the original
      object request's reference.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b155e86c
    • A
      libceph: kill off osd data write_request parameters · 406e2c9f
      Alex Elder 提交于
      In the incremental move toward supporting distinct data items in an
      osd request some of the functions had "write_request" parameters to
      indicate, basically, whether the data belonged to in_data or the
      out_data.  Now that we maintain the data fields in the op structure
      there is no need to indicate the direction, so get rid of the
      "write_request" parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      406e2c9f
    • A
      rbd: implement layered reads · 8b3e1a56
      Alex Elder 提交于
      Implement layered read requests for format 2 rbd images.
      
      If an rbd image is a clone of a snapshot, the snapshot will be the
      clone's "parent" image.  When an object read request on a clone
      comes back with ENOENT it indicates that the clone is not yet
      populated with that portion of the image's data, and the parent
      image should be consulted to satisfy the read.
      
      When this occurs, a new image request is created, directed to the
      parent image.  The offset and length of the image are the same as
      the image-relative offset and length of the object request that
      produced ENOENT.  Data from the parent image therefore satisfies the
      object read request for the original image request.
      
      While this code works, it will not be active until we enable the
      layering feature (by adding RBD_FEATURE_LAYERING to the value of
      RBD_FEATURES_SUPPORTED).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8b3e1a56
    • A
      rbd: probe the parent of an image if present · 2f82ee54
      Alex Elder 提交于
      Call the probe function for the parent device if one is present.
      Since we don't formally support the layering feature we won't
      be using this functionality just yet.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2f82ee54
    • A
      rbd: add an object request flag for image data objects · 6365d33a
      Alex Elder 提交于
      Add a flag to distinguish between object requests being done on
      standalone objects and requests being sent for objects representing
      rbd image data (i.e., object requests that are the result of image
      request).
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6365d33a
    • A
      rbd: define an rbd object request flags field · 926f9b3f
      Alex Elder 提交于
      We're going to need some more Boolean values for object requests,
      so create a flags bit field and use it to record whether the request
      is done.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      926f9b3f
    • A
      rbd: encapsulate image object end request handling · 1217857f
      Alex Elder 提交于
      Encapsulate the code that completes processing of an object request
      that's part of an image request.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      1217857f
    • A
      rbd: define image request layered flag · d0b2e944
      Alex Elder 提交于
      Define a flag indicating whether an image request is for a layered
      image (one with a parent image to which requests will be redirected
      if the target object of a request does not exist).  The code that
      checks this flag will be added shortly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d0b2e944
    • A
      rbd: define image request originator flag · 9849e986
      Alex Elder 提交于
      Define a flag indicating whether an image request originated from
      the Linux block layer (from blk_fetch_request()) or whether it was
      initiated in order to satisfy an object request for a child image
      of a layered rbd device.  For image requests initiated by objects of
      child images we'll save a pointer to the object request rather than
      the Linux block request.
      
      For now, only block requests are used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9849e986
    • A
      rbd: define image request flags · 0c425248
      Alex Elder 提交于
      There are several Boolean values we'll be maintaining for image
      requests.  Switch from the single write_request field to a
      general-purpose flags field, and use one if its bits to represent
      the direction of I/O for the image request.  Define helper functions
      for setting and testing that flag.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0c425248
    • A
      rbd: record image-relative offset in object requests · 7da22d29
      Alex Elder 提交于
      For an image object request we will need to know what offset within
      the rbd image the request covers.  Record that when the object
      request gets created.
      
      Update the I/O error warnings so they use this so what's reported
      is more informative.
      
      Rename a local variable to fit the convention used everywhere else.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      7da22d29
    • A
      rbd: record aggregate image transfer count · 55f27e09
      Alex Elder 提交于
      Compute the total number of bytes transferred for an image
      request--the sum across each of the request's object requests.
      To avoid contention do it only when all object requests are
      complete, in rbd_img_request_complete().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      55f27e09
    • A
      rbd: record overall image request result · a5a337d4
      Alex Elder 提交于
      If any image object request produces a non-zero result, preserve
      that as the result of the overall image request.  If multiple
      objects have non-zero results, save only the first one.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a5a337d4
    • A
      rbd: update feature bits · 5cbf6f12
      Alex Elder 提交于
      There is a new rbd feature bit defined for "fancy striping." Add
      it to the ones defined in the kernel client.
      
      Change RBD_FEATURES_ALL so it represents the set of all feature
      bits (rather than just the ones we support).  Define a new symbol
      RBD_FEATURES_SUPPORTED to indicate the supported ones.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5cbf6f12
    • A
      libceph: make method call data be a separate data item · 04017e29
      Alex Elder 提交于
      Right now the data for a method call is specified via a pointer and
      length, and it's copied--along with the class and method name--into
      a pagelist data item to be sent to the osd.  Instead, encode the
      data in a data item separate from the class and method names.
      
      This will allow large amounts of data to be supplied to methods
      without copying.  Only rbd uses the class functionality right now,
      and when it really needs this it will probably need to use a page
      array rather than a page list.  But this simple implementation
      demonstrates the functionality on the osd client, and that's enough
      for now.
      
      This resolves:
          http://tracker.ceph.com/issues/4104Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      04017e29
    • A
      libceph: combine initializing and setting osd data · a4ce40a9
      Alex Elder 提交于
      This ends up being a rather large patch but what it's doing is
      somewhat straightforward.
      
      Basically, this is replacing two calls with one.  The first of the
      two calls is initializing a struct ceph_osd_data with data (either a
      page array, a page list, or a bio list); the second is setting an
      osd request op so it associates that data with one of the op's
      parameters.  In place of those two will be a single function that
      initializes the op directly.
      
      That means we sort of fan out a set of the needed functions:
          - extent ops with pages data
          - extent ops with pagelist data
          - extent ops with bio list data
      and
          - class ops with page data for receiving a response
      
      We also have define another one, but it's only used internally:
          - class ops with pagelist data for request parameters
      
      Note that we *still* haven't gotten rid of the osd request's
      r_data_in and r_data_out fields.  All the osd ops refer to them for
      their data.  For now, these data fields are pointers assigned to the
      appropriate r_data_* field when these new functions are called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      a4ce40a9