1. 26 2月, 2013 3 次提交
  2. 20 2月, 2013 4 次提交
  3. 19 2月, 2013 13 次提交
  4. 14 2月, 2013 4 次提交
    • A
      libceph: don't require r_num_pages for bio requests · 9cbb1d72
      Alex Elder 提交于
      There is a check in the completion path for osd requests that
      ensures the number of pages allocated is enough to hold the amount
      of incoming data expected.
      
      For bio requests coming from rbd the "number of pages" is not really
      meaningful (although total length would be).  So stop requiring that
      nr_pages be supplied for bio requests.  This is done by checking
      whether the pages pointer is null before checking the value of
      nr_pages.
      
      Note that this value is passed on to the messenger, but there it's
      only used for debugging--it's never used for validation.
      
      While here, change another spot that used r_pages in a debug message
      inappropriately, and also invalidate the r_con_filling_msg pointer
      after dropping a reference to it.
      
      This resolves:
          http://tracker.ceph.com/issues/3875Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      9cbb1d72
    • A
      rbd: don't take extra bio reference for osd client · 1e32d34c
      Alex Elder 提交于
      Currently, if the OSD client finds an osd request has had a bio list
      attached to it, it drops a reference to it (or rather, to the first
      entry on that list) when the request is released.
      
      The code that added that reference (i.e., the rbd client) is
      therefore required to take an extra reference to that first bio
      structure.
      
      The osd client doesn't really do anything with the bio pointer other
      than transfer it from the osd request structure to outgoing (for
      writes) and ingoing (for reads) messages.  So it really isn't the
      right place to be taking or dropping references.
      
      Furthermore, the rbd client already holds references to all bio
      structures it passes to the osd client, and holds them until the
      request is completed.  So there's no need for this extra reference
      whatsoever.
      
      So remove the bio_put() call in ceph_osdc_release_request(), as
      well as its matching bio_get() call in rbd_osd_req_create().
      
      This change could lead to a crash if old libceph.ko was used with
      new rbd.ko.  Add a compatibility check at rbd initialization time to
      avoid this possibilty.
      
      This resolves:
          http://tracker.ceph.com/issues/3798    and
          http://tracker.ceph.com/issues/3799Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      1e32d34c
    • A
      libceph: add a compatibility check interface · 72fe25e3
      Alex Elder 提交于
      An upcoming change implements semantic change that could lead to
      a crash if an old version of the libceph kernel module is used with
      a new version of the rbd kernel module.
      
      In order to preclude that possibility, this adds a compatibilty
      check interface.  If this interface doesn't exist, the modules are
      obviously not compatible.  But if it does exist, this provides a way
      of letting the caller know whether it will operate properly with
      this libceph module.
      
      Perhaps confusingly, it returns false right now.  The semantic
      change mentioned above will make it return true.
      
      This resolves:
          http://tracker.ceph.com/issues/3800Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      72fe25e3
    • A
      libceph: fix messenger CONFIG_BLOCK dependencies · 3ebc21f7
      Alex Elder 提交于
      The ceph messenger has a few spots that are only used when
      bio messages are supported, and that's only when CONFIG_BLOCK
      is defined.  This surrounds a couple of spots with #ifdef's
      that would cause a problem if CONFIG_BLOCK were not present
      in the kernel configuration.
      
      This resolves:
          http://tracker.ceph.com/issues/3976Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      3ebc21f7
  5. 26 1月, 2013 1 次提交
    • C
      libceph: fix undefined behavior when using snprintf() · 1ec3911d
      Cong Ding 提交于
      The variable "str" is used as both the source and destination in
      function snprintf(), which is undefined behavior based on C11. The
      original description in C11 is:
      	"If copying takes place between objects that
      	overlap, the behavior is undefined."
      
      And, the function of ceph_osdmap_state_str() is to return the osdmap
      state, so it should return "doesn't exist" when all the conditions
      are not satisfied. I fix it in this patch.
      
      [elder@inktank.com: shortened the commit message]
      Signed-off-by: NCong Ding <dinggnu@gmail.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1ec3911d
  6. 18 1月, 2013 14 次提交
    • A
      libceph: pass num_op with ops · ae7ca4a3
      Alex Elder 提交于
      Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are
      provided an array of ceph osd request operations.  Rather than just
      passing the number of operations in the array, the caller is
      required append an additional zeroed operation structure to signal
      the end of the array.
      
      All callers know the number of operations at the time these
      functions are called, so drop the silly zero entry and supply that
      number directly.  As a result, get_num_ops() is no longer needed.
      This also means that ceph_osdc_alloc_request() never uses its ops
      argument, so that can be dropped.
      
      Also rbd_create_rw_ops() no longer needs to add one to reserve room
      for the additional op.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      ae7ca4a3
    • A
      libceph: don't set pages or bio in ceph_osdc_alloc_request() · 54a54007
      Alex Elder 提交于
      Only one of the two callers of ceph_osdc_alloc_request() provides
      page or bio data for its payload.  And essentially all that function
      was doing with those arguments was assigning them to fields in the
      osd request structure.
      
      Simplify ceph_osdc_alloc_request() by having the caller take care of
      making those assignments
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      54a54007
    • A
      libceph: don't set flags in ceph_osdc_alloc_request() · d178a9e7
      Alex Elder 提交于
      The only thing ceph_osdc_alloc_request() really does with the
      flags value it is passed is assign it to the newly-created
      osd request structure.  Do that in the caller instead.
      
      Both callers subsequently call ceph_osdc_build_request(), so have
      that function (instead of ceph_osdc_alloc_request()) issue a warning
      if a request comes through with neither the read nor write flags set.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d178a9e7
    • A
      libceph: drop osdc from ceph_calc_raw_layout() · e75b45cf
      Alex Elder 提交于
      The osdc parameter to ceph_calc_raw_layout() is not used, so get rid
      of it.  Consequently, the corresponding parameter in calc_layout()
      becomes unused, so get rid of that as well.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e75b45cf
    • A
      libceph: drop snapid in ceph_calc_raw_layout() · 4d6b250b
      Alex Elder 提交于
      A snapshot id must be provided to ceph_calc_raw_layout() even though
      it is not needed at all for calculating the layout.
      
      Where the snapshot id *is* needed is when building the request
      message for an osd operation.
      
      Drop the snapid parameter from ceph_calc_raw_layout() and pass
      that value instead in ceph_osdc_build_request().
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      4d6b250b
    • A
      libceph: pass length to ceph_calc_file_object_mapping() · e8afad65
      Alex Elder 提交于
      ceph_calc_file_object_mapping() takes (among other things) a "file"
      offset and length, and based on the layout, determines the object
      number ("bno") backing the affected portion of the file's data and
      the offset into that object where the desired range begins.  It also
      computes the size that should be used for the request--either the
      amount requested or something less if that would exceed the end of
      the object.
      
      This patch changes the input length parameter in this function so it
      is used only for input.  That is, the argument will be passed by
      value rather than by address, so the value provided won't get
      updated by the function.
      
      The value would only get updated if the length would surpass the
      current object, and in that case the value it got updated to would
      be exactly that returned in *oxlen.
      
      Only one of the two callers is affected by this change.  Update
      ceph_calc_raw_layout() so it records any updated value.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e8afad65
    • A
      libceph: pass length to ceph_osdc_build_request() · 0120be3c
      Alex Elder 提交于
      The len argument to ceph_osdc_build_request() is set up to be
      passed by address, but that function never updates its value
      so there's no need to do this.  Tighten up the interface by
      passing the length directly.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      0120be3c
    • A
      libceph: kill op_needs_trail() · 5b9d1b1c
      Alex Elder 提交于
      Since every osd message is now prepared to include trailing data,
      there's no need to check ahead of time whether any operations will
      make use of the trail portion of the message.
      
      We can drop the second argument to get_num_ops(), and as a result we
      can also get rid of op_needs_trail() which is no longer used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      5b9d1b1c
    • A
      libceph: always allow trail in osd request · c885837f
      Alex Elder 提交于
      An osd request structure contains an optional trail portion, which
      if present will contain data to be passed in the payload portion of
      the message containing the request.  The trail field is a
      ceph_pagelist pointer, and if null it indicates there is no trail.
      
      A ceph_pagelist structure contains a length field, and it can
      legitimately hold value 0.  Make use of this to change the
      interpretation of the "trail" of an osd request so that every osd
      request has trailing data, it just might have length 0.
      
      This means we change the r_trail field in a ceph_osd_request
      structure from a pointer to a structure that is always initialized.
      
      Note that in ceph_osdc_start_request(), the trail pointer (or now
      address of that structure) is assigned to a ceph message's trail
      field.  Here's why that's still OK (looking at net/ceph/messenger.c):
          - What would have resulted in a null pointer previously will now
            refer to a 0-length page list.  That message trail pointer
            is used in two functions, write_partial_msg_pages() and
            out_msg_pos_next().
          - In write_partial_msg_pages(), a null page list pointer is
            handled the same as a message with 0-length trail, and both
            result in a "in_trail" variable set to false.  The trail
            pointer is only used if in_trail is true.
          - The only other place the message trail pointer is used is
            out_msg_pos_next().  That function is only called by
            write_partial_msg_pages() and only touches the trail pointer
            if the in_trail value it is passed is true.
      Therefore a null ceph_msg->trail pointer is equivalent to a non-null
      pointer referring to a 0-length page list structure.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c885837f
    • A
      rbd: drop oid parameters from ceph_osdc_build_request() · af77f26c
      Alex Elder 提交于
      The last two parameters to ceph_osd_build_request() describe the
      object id, but the values passed always come from the osd request
      structure whose address is also provided.  Get rid of those last
      two parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      af77f26c
    • A
      libceph: reformat __reset_osd() · c3acb181
      Alex Elder 提交于
      Reformat __reset_osd() into three distinct blocks of code
      handling the three return cases.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      c3acb181
    • S
      crush: avoid recursion if we have already collided · 7d7c1f61
      Sage Weil 提交于
      This saves us some cycles, but does not affect the placement result at
      all.
      
      This corresponds to ceph.git commit 4abb53d4f.
      Signed-off-by: NSage Weil <sage@inktank.com>
      7d7c1f61
    • J
      libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed · 1604f488
      Jim Schutt 提交于
      Add libceph support for a new CRUSH tunable recently added to Ceph servers.
      
      Consider the CRUSH rule
        step chooseleaf firstn 0 type <node_type>
      
      This rule means that <n> replicas will be chosen in a manner such that
      each chosen leaf's branch will contain a unique instance of <node_type>.
      
      When an object is re-replicated after a leaf failure, if the CRUSH map uses
      a chooseleaf rule the remapped replica ends up under the <node_type> bucket
      that held the failed leaf.  This causes uneven data distribution across the
      storage cluster, to the point that when all the leaves but one fail under a
      particular <node_type> bucket, that remaining leaf holds all the data from
      its failed peers.
      
      This behavior also limits the number of peers that can participate in the
      re-replication of the data held by the failed leaf, which increases the
      time required to re-replicate after a failure.
      
      For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
      inner and outer descents.
      
      If the tree descent down to <node_type> is the outer descent, and the descent
      from <node_type> down to a leaf is the inner descent, the issue is that a
      down leaf is detected on the inner descent, so only the inner descent is
      retried.
      
      In order to disperse re-replicated data as widely as possible across a
      storage cluster after a failure, we want to retry the outer descent. So,
      fix up crush_choose() to allow the inner descent to return immediately on
      choosing a failed leaf.  Wire this up as a new CRUSH tunable.
      
      Note that after this change, for a chooseleaf rule, if the primary OSD
      in a placement group has failed, choosing a replacement may result in
      one of the other OSDs in the PG colliding with the new primary.  This
      requires that OSD's data for that PG to need moving as well.  This
      seems unavoidable but should be relatively rare.
      
      This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1604f488
    • Y
      ceph: re-calculate truncate_size for strip object · a41bad1a
      Yan, Zheng 提交于
      Otherwise osd may truncate the object to larger size.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a41bad1a
  7. 28 12月, 2012 1 次提交