1. 14 5月, 2013 6 次提交
    • A
      rbd: don't release write request until necessary · 8785b1d4
      Alex Elder 提交于
      Previously when a layered write was going to involve a copyup
      request, the original osd request was released before submitting the
      parent full-object read.  The osd request for the copyup would then
      be allocated in rbd_img_obj_parent_read_full_callback().
      
      Shortly we will be handling the event of mapped layered images
      getting flattened, and when that occurs we need to resubmit the
      original request.  We therefore don't want to release the osd
      request until we really konw we're going to replace it--in the
      callback function.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      8785b1d4
    • A
      rbd: get parent info on refresh · 642a2537
      Alex Elder 提交于
      Get parent info for format 2 images on every refresh (rather than
      just during the initial probe).  This will be needed to detect the
      disappearance of the parent image in the event a mapped image
      becomes unlayered (i.e., flattened).  Avoid leaking the previous
      parent spec on the second and subsequent times this information is
      requested by dropping the previous one (if any) before updating it.
      (Also, extract the pool id into a local variable before assigning
      it into the parent spec.)
      
      Switch to using a non-zero parent overlap value rather than the
      existence of a parent (a non-null parent_spec pointer) to determine
      whether to mark a request layered.  It will soon be possible for
      a layered image to become unlayered while a request is in flight.
      
      This means that the layered flag for an image request indicates that
      there was a non-zero parent overlap at the time the image request
      was created.  The parent overlap can change thereafter, which may
      lead to special handling at request submission or completion time.
      
      This and the next several patches are related to:
          http://tracker.ceph.com/issues/3763
      
      NOTE:
      If an error occurs while refreshing the parent info (i.e.,
      requesting it after initial probe), the old parent info will
      persist.  This is not really correct, and is a scenario that needs
      to be addressed.  For now we'll assert that the failure mode is
      unlikely, but the issue has been documented in tracker issue 5040.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      642a2537
    • A
      rbd: ignore zero-overlap parent · 70cf49cf
      Alex Elder 提交于
      An rbd clone image that has an overlap with its parent of 0 is
      effectively not a layered image at all.  Detect this case and treat
      such an image as non-layered.  Issue a warning to be sure the user
      knows what's going on.
      
      This resolves:
          http://tracker.ceph.com/issues/5028Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      70cf49cf
    • A
      rbd: support reading parent page data for writes · b91f09f1
      Alex Elder 提交于
      Currently, rbd_img_obj_parent_read_full() assumes the incoming
      object request contains bio data.  But if a layered image is part of
      a multi-layer stack of images it will result in read requests of
      page data to parent images.
      
      This is handling the same kind of issue as was resolved by this
      commit:
          5b2ab72d  rbd: support reading parent page data
      
      This resolves:
          http://tracker.ceph.com/issues/5027Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b91f09f1
    • A
      rbd: fix parent request size assumption · ebda6408
      Alex Elder 提交于
      The code that reads object data from the parent for a copyup on
      write request currently assumes that the size of that request is the
      size of a "full" object from the original target image.
      
      That is not necessarily the case.  The parent overlap could reduce
      the request size below that.  To fix that assumption we need to
      record the number of pages in the copyup_pages array, for both an
      image request and an object request.  Rename a local variable in
      rbd_img_obj_parent_read_full_callback() to reflect we're recording
      the length of the parent read request, not the size of the target
      object.
      
      This resolves:
          http://tracker.ceph.com/issues/5038Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ebda6408
    • A
      libceph: init sent and completed when starting · c10ebbf5
      Alex Elder 提交于
      The rbd code has a need to be able to restart an osd request that
      has already been started and completed once before.  This currently
      wouldn't work right because the osd client code assumes an osd
      request will be started exactly once  Certain fields in a request
      are never cleared and this leads to trouble if you try to reuse it.
      
      Specifically, the r_sent, r_got_reply, and r_completed fields are
      never cleared.  The r_sent field records the osd incarnation at the
      time the request was sent to that osd.  If that's non-zero, the
      message won't get re-mapped to a target osd properly, and won't be
      put on the unsafe requests list the first time it's sent as it
      should.  The r_got_reply field is used in handle_reply() to ensure
      the reply to a request is processed only once.  And the r_completed
      field is used for lingering requests to avoid calling the callback
      function every time the osd client re-sends the request on behalf of
      its initiator.
      
      Each osd request passes through ceph_osdc_start_request() when
      responsibility for the request is handed over to the osd client for
      completion.  We can safely zero these three fields there each time a
      request gets started.
      
      One last related change--clear the r_linger flag when a request
      is no longer registered as a linger request.
      
      This resolves:
          http://tracker.ceph.com/issues/5026Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c10ebbf5
  2. 09 5月, 2013 12 次提交
  3. 08 5月, 2013 8 次提交
  4. 03 5月, 2013 10 次提交
  5. 02 5月, 2013 4 次提交
    • A
      rbd: kill off the snapshot list · 33dca39f
      Alex Elder 提交于
      We no longer use the snapshot list for anything.  When we need to
      look up a snapshot name, id, size, or feature mask, we just do it
      directly rather than relying on this list being updated with every
      refresh.  The main reason it existed was for the benefit of the
      device/sysfs entries that previously were associated with snapshots.
      
      So get rid of the snapshot list, and struct rbd_snap, and the
      hundreds of lines of code that supported them.
      
      This resolves:
          http://tracker.ceph.com/issues/4868Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      33dca39f
    • A
      rbd: define rbd_snap_size() and rbd_snap_features() · 2ad3d716
      Alex Elder 提交于
      This patch defines a handful of new functions that will allow
      us to get rid of the rbd device structure's list of snapshots.
      
      Define rbd_snap_id_by_name() to look up a snapshot id given its
      name.  This is efficient for format 1 images but not for format 2.
      Fortunately it only gets called at mapping time so it's not that
      critical.
      
      Use rbd_snap_id_by_name() to find out the id for a snapshot getting
      mapped, and pass that id to new functions rbd_snap_size() and
      rbd_snap_features() to look up information about a given snapshot's
      size and feature mask given its snapshot id.  All this gets done
      in rbd_dev_mapping_set().
      
      As a result, snap_by_name() is no longer needed, so get rid of it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      2ad3d716
    • A
      rbd: use snap_id not index to look up snap info · 54cac61f
      Alex Elder 提交于
      In order to align with what was needed for format 1 rbd images,
      rbd_dev_v2_snap_info() was set up to take as argument an index into
      the array of snapshot ids in a rbd device's snapshot context.
      
      This switches that around, so we pass the snapshot id instead.
      In doing this, rbd_snap_name() now returns a dynamically-allocated
      string rather than a fixed one, so there's no need to make a
      duplicate in its caller, rbd_dev_spec_update().
      
      This means the following functions take a snapshot id where they
      previously used an index value:
          rbd_dev_snap_info()
          rbd_dev_v1_snap_info()
          rbd_dev_v2_snap_info()
      
      A new function, rbd_dev_snap_index(), determines the snap index for
      format 1 images and uses it to look up the name.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      54cac61f
    • A
      rbd: look up snapshot name in names buffer · 9682fc6d
      Alex Elder 提交于
      Rather than scanning the list of snapshot structures for it, scan
      the snapshot context buffer containing snapshot names in order to
      determine for a format 1 image the name associated with a given
      snapshot id.
      
      Pull out the part of rbd_dev_v1_snap_info() that does this scan into
      a new function, _rbd_dev_v1_snap_name().  Have that function return
      a dynamically-allocated copy of the name, and don't duplicate it in
      rbd_dev_v1_snap_info().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9682fc6d