1. 22 4月, 2015 1 次提交
  2. 05 4月, 2014 4 次提交
    • I
      crush: add SET_CHOOSELEAF_VARY_R step · d83ed858
      Ilya Dryomov 提交于
      This lets you adjust the vary_r tunable on a per-rule basis.
      
      Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d83ed858
    • I
      crush: add chooseleaf_vary_r tunable · e2b149cc
      Ilya Dryomov 提交于
      The current crush_choose_firstn code will re-use the same 'r' value for
      the recursive call.  That means that if we are hitting a collision or
      rejection for some reason (say, an OSD that is marked out) and need to
      retry, we will keep making the same (bad) choice in that recursive
      selection.
      
      Introduce a tunable that fixes that behavior by incorporating the parent
      'r' value into the recursive starting point, so that a different path
      will be taken in subsequent placement attempts.
      
      Note that this was done from the get-go for the new crush_choose_indep
      algorithm.
      
      This was exposed by a user who was seeing PGs stuck in active+remapped
      after reweight-by-utilization because the up set mapped to a single OSD.
      
      Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      e2b149cc
    • I
      crush: allow crush rules to set (re)tries counts to 0 · 6ed1002f
      Ilya Dryomov 提交于
      These two fields are misnomers; they are *retry* counts.
      
      Reflects ceph.git commit f17caba8ae0cad7b6f8f35e53e5f73b444696835.
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      6ed1002f
    • I
      crush: fix off-by-one errors in total_tries refactor · 48a163db
      Ilya Dryomov 提交于
      Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH
      code to allow adjustment of the retry counts on a per-pool basis.  That
      commit had an off-by-one bug: the previous "tries" counter was a *retry*
      count, not a *try* count, but the new code was passing in 1 meaning
      there should be no retries.
      
      Fix the ftotal vs tries comparison to use < instead of <= to fix the
      problem.  Note that the original code used <= here, which means the
      global "choose_total_tries" tunable is actually counting retries.
      Compensate for that by adding 1 in crush_do_rule when we pull the tunable
      into the local variable.
      
      This was noticed looking at output from a user provided osdmap.
      Unfortunately the map doesn't illustrate the change in mapping behavior
      and I haven't managed to construct one yet that does.  Inspection of the
      crush debug output now aligns with prior versions, though.
      
      Reflects ceph.git commit 795704fd615f0b008dcc81aa088a859b2d075138.
      Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      48a163db
  3. 01 1月, 2014 18 次提交
  4. 18 1月, 2013 2 次提交
    • S
      crush: avoid recursion if we have already collided · 7d7c1f61
      Sage Weil 提交于
      This saves us some cycles, but does not affect the placement result at
      all.
      
      This corresponds to ceph.git commit 4abb53d4f.
      Signed-off-by: NSage Weil <sage@inktank.com>
      7d7c1f61
    • J
      libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed · 1604f488
      Jim Schutt 提交于
      Add libceph support for a new CRUSH tunable recently added to Ceph servers.
      
      Consider the CRUSH rule
        step chooseleaf firstn 0 type <node_type>
      
      This rule means that <n> replicas will be chosen in a manner such that
      each chosen leaf's branch will contain a unique instance of <node_type>.
      
      When an object is re-replicated after a leaf failure, if the CRUSH map uses
      a chooseleaf rule the remapped replica ends up under the <node_type> bucket
      that held the failed leaf.  This causes uneven data distribution across the
      storage cluster, to the point that when all the leaves but one fail under a
      particular <node_type> bucket, that remaining leaf holds all the data from
      its failed peers.
      
      This behavior also limits the number of peers that can participate in the
      re-replication of the data held by the failed leaf, which increases the
      time required to re-replicate after a failure.
      
      For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
      inner and outer descents.
      
      If the tree descent down to <node_type> is the outer descent, and the descent
      from <node_type> down to a leaf is the inner descent, the issue is that a
      down leaf is detected on the inner descent, so only the inner descent is
      retried.
      
      In order to disperse re-replicated data as widely as possible across a
      storage cluster after a failure, we want to retry the outer descent. So,
      fix up crush_choose() to allow the inner descent to return immediately on
      choosing a failed leaf.  Wire this up as a new CRUSH tunable.
      
      Note that after this change, for a chooseleaf rule, if the primary OSD
      in a placement group has failed, choosing a replacement may result in
      one of the other OSDs in the PG colliding with the new primary.  This
      requires that OSD's data for that PG to need moving as well.  This
      seems unavoidable but should be relatively rare.
      
      This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.
      Signed-off-by: NJim Schutt <jaschut@sandia.gov>
      Reviewed-by: NSage Weil <sage@inktank.com>
      1604f488
  5. 31 7月, 2012 1 次提交
  6. 08 5月, 2012 5 次提交
  7. 27 4月, 2012 1 次提交
  8. 16 4月, 2012 1 次提交
  9. 11 1月, 2012 1 次提交
  10. 13 12月, 2011 1 次提交
  11. 21 10月, 2010 1 次提交
    • Y
      ceph: factor out libceph from Ceph file system · 3d14c5d2
      Yehuda Sadeh 提交于
      This factors out protocol and low-level storage parts of ceph into a
      separate libceph module living in net/ceph and include/linux/ceph.  This
      is mostly a matter of moving files around.  However, a few key pieces
      of the interface change as well:
      
       - ceph_client becomes ceph_fs_client and ceph_client, where the latter
         captures the mon and osd clients, and the fs_client gets the mds client
         and file system specific pieces.
       - Mount option parsing and debugfs setup is correspondingly broken into
         two pieces.
       - The mon client gets a generic handler callback for otherwise unknown
         messages (mds map, in this case).
       - The basic supported/required feature bits can be expanded (and are by
         ceph_fs_client).
      
      No functional change, aside from some subtle error handling cases that got
      cleaned up in the refactoring process.
      Signed-off-by: NSage Weil <sage@newdream.net>
      3d14c5d2
  12. 06 7月, 2010 1 次提交
  13. 25 6月, 2010 2 次提交
  14. 04 12月, 2009 1 次提交