提交 · 56a4f3091dceb7dfc14dc3ef1d5f59fe39ba4447 · openeuler / Kernel

05 2月, 2016 2 次提交

crush: ensure take bucket value is valid · 56a4f309

由 Ilya Dryomov 提交于 1月 31, 2016

Ensure that the take argument is a valid bucket ID before indexing the
buckets array.

Reflects ceph.git commit 93ec538e8a667699876b72459b8ad78966d89c61.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

56a4f309

crush: ensure bucket id is valid before indexing buckets array · f224a691

由 Ilya Dryomov 提交于 1月 31, 2016

We were indexing the buckets array without verifying the index was
within the [0,max_buckets) range.  This could happen because
a multistep rule does not have enough buckets and has CRUSH_ITEM_NONE
for an intermediate result, which would feed in CRUSH_ITEM_NONE and
make us crash.

Reflects ceph.git commit 976a24a326da8931e689ee22fce35feab5b67b76.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f224a691

25 6月, 2015 2 次提交

crush: sync up with userspace · b459be73

由 Ilya Dryomov 提交于 6月 12, 2015

.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
<xerofoify@gmail.com> is folded in as crush_ln() has been made static
in userspace as well.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b459be73

crush: fix crash from invalid 'take' argument · 8f529795

由 Ilya Dryomov 提交于 6月 12, 2015

Verify that the 'take' argument is a valid device or bucket.
Otherwise ignore it (do not add the value to the working vector).

Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8f529795

22 4月, 2015 3 次提交

crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765

由 Ilya Dryomov 提交于 4月 14, 2015

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

958a2765

crush: ensuring at most num-rep osds are selected · 45002267

由 Ilya Dryomov 提交于 4月 14, 2015

Crush temporary buffers are allocated as per replica size configured
by the user.  When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash.  Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                          234b066ba04976783d15ff2abc3e81b6cc06fb10.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

45002267

I
crush: drop unnecessary include from mapper.c · 9be6df21
由 Ilya Dryomov 提交于 4月 14, 2015
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
9be6df21

05 4月, 2014 4 次提交

crush: add SET_CHOOSELEAF_VARY_R step · d83ed858

由 Ilya Dryomov 提交于 3月 19, 2014

This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d83ed858

crush: add chooseleaf_vary_r tunable · e2b149cc

由 Ilya Dryomov 提交于 3月 19, 2014

The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e2b149cc

crush: allow crush rules to set (re)tries counts to 0 · 6ed1002f

由 Ilya Dryomov 提交于 3月 19, 2014

These two fields are misnomers; they are *retry* counts.

Reflects ceph.git commit f17caba8ae0cad7b6f8f35e53e5f73b444696835.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

6ed1002f

crush: fix off-by-one errors in total_tries refactor · 48a163db

由 Ilya Dryomov 提交于 3月 19, 2014

Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis. That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.

Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem. Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.

This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does. Inspection of the
crush debug output now aligns with prior versions, though.

Reflects ceph.git commit 795704fd615f0b008dcc81aa088a859b2d075138.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

48a163db

01 1月, 2014 18 次提交

crush: fix crush_choose_firstn comment · 0e32d712

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit 8b38f10bc2ee3643a33ea5f9545ad5c00e4ac5b4.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0e32d712

crush: attempts -> tries · 2d8be0bc

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit ea3a0bb8b773360d73b8b77fa32115ef091c9857.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

2d8be0bc

crush: add set_choose_local_[fallback_]tries steps · f046bf92

由 Ilya Dryomov 提交于 12月 24, 2013

This allows all of the tunables to be overridden by a specific rule.

Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292,
                          0497db49e5973b50df26251ed0e3f4ac7578e66e.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f046bf92

crush: generalize descend_once · d390bb2a

由 Ilya Dryomov 提交于 12月 24, 2013

The legacy behavior is to make the normal number of tries for the
recursive chooseleaf call.  The descend_once tunable changed this to
making a single try and bail if we get a reject (note that it is
impossible to collide in the recursive case).

The new set_chooseleaf_tries lets you select the number of recursive
chooseleaf attempts for indep mode, or default to 1.  Use the same
behavior for firstn, except default to total_tries when the legacy
tunables are set (for compatibility).  This makes the rule step
override the (new) default of 1 recursive attempt, keeping behavior
consistent with indep mode.

Reflects ceph.git commit 685c6950ef3df325ef04ce7c986e36ca2514c5f1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

d390bb2a

crush: CHOOSE_LEAF -> CHOOSELEAF throughout · 917edad5

由 Ilya Dryomov 提交于 12月 24, 2013

This aligns the internal identifier names with the user-visible names in
the decompiled crush map language.

Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

917edad5

crush: add SET_CHOOSE_TRIES rule step · cc10df4a

由 Ilya Dryomov 提交于 12月 24, 2013

Since we can specify the recursive retries in a rule, we may as well also
specify the non-recursive tries too for completeness.

Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

cc10df4a

crush: apply chooseleaf_tries to firstn mode too · f18650ac

由 Ilya Dryomov 提交于 12月 24, 2013

Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well. Note that we have
slightly different behavior here than with indep:

If the firstn value is not specified for firstn, we pass through the
normal attempt count. This maintains compatibility with legacy behavior.
Note that this is usually *not* actually N^2 work, though, because of the
descend_once tunable. However, descend_once is unfortunately *not* the
same thing as 1 chooseleaf try because it is only checked on a reject but
not on a collision. Sigh.

In contrast, for indep, if tries is not specified we default to 1
recursive attempt, because that is simply more sane, and we have the
option to do so. The descend_once tunable has no effect for indep.

Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f18650ac

crush: new SET_CHOOSE_LEAF_TRIES command · be3226ac

由 Ilya Dryomov 提交于 12月 24, 2013

Explicitly control the number of sample attempts, and allow the number of
tries in the recursive call to be explicitly controlled via the rule. This
is important because the amount of time we want to spend looking for a
solution may be rule dependent (e.g., higher for the wide indep pool than
the rep pools).

(We should do the same for the other tunables, by the way!)

Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

be3226ac

crush: pass parent r value for indep call · 41586081

由 Ilya Dryomov 提交于 12月 24, 2013

Pass down the parent's 'r' value so that we will sample different values in
the recursive call when the parent tries multiple times. This avoids doing
useless work (calling multiple times and trying the same values).

Reflects ceph.git commit 2731d3030d7a3e80922b7f1b7756f9a4a124bac5.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

41586081

crush: clarify numrep vs endpos · ab4ce2b5

由 Ilya Dryomov 提交于 12月 24, 2013

Pass numrep (the width of the result) separately from the number of results
we want *this* iteration.  This makes things less awkward when we do a
recursive call (for chooseleaf) and want only one item.

Reflects ceph.git commit 1b567ee08972f268c11b43fc881e57b5984dd08b.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ab4ce2b5

crush: strip firstn conditionals out of crush_choose, rename · 9fe07182

由 Ilya Dryomov 提交于 12月 24, 2013

Now that indep is handled by crush_choose_indep, rename crush_choose to
crush_choose_firstn and remove all the conditionals.  This ends up
stripping out *lots* of code.

Note that it *also* makes it obvious that the shenanigans we were playing
with r' for uniform buckets were broken for firstn mode.  This appears to
have happened waaaay back in commit dae8bec9 (or earlier)... 2007.

Reflects ceph.git commit 94350996cb2035850bcbece6a77a9b0394177ec9.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

9fe07182

crush: add note about r in recursive choose · 3102b0a5

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit 4551fee9ad89d0427ed865d766d0d44004d3e3e1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3102b0a5

crush: use breadth-first search for indep mode · 9a3b490a

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit 86e978036a4ecbac4c875e7c00f6c5bbe37282d3.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

9a3b490a

crush: return CRUSH_ITEM_UNDEF for failed placements with indep · c6d98a60

由 Ilya Dryomov 提交于 12月 24, 2013

For firstn mode, if we fail to make a valid placement choice, we just
continue and return a short result to the caller. For indep mode, however,
we need to make the position stable, and return an undefined value on
failed placements to avoid shifting later results to the left.

Reflects ceph.git commit b1d4dd4eb044875874a1d01c01c7d766db5d0a80.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c6d98a60

crush: eliminate CRUSH_MAX_SET result size limitation · e8ef19c4

由 Ilya Dryomov 提交于 12月 24, 2013

This is only present to size the temporary scratch arrays that we put on
the stack.  Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e8ef19c4

crush: fix some comments · 2a4ba74e

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit 3cef755428761f2481b1dd0e0fbd0464ac483fc5.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

2a4ba74e

crush: reduce scope of some local variables · 8f99c85b

由 Ilya Dryomov 提交于 12月 24, 2013

Reflects ceph.git commit e7d47827f0333c96ad43d257607fb92ed4176550.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

8f99c85b

crush: pass weight vector size to map function · b3b33b0e

由 Ilya Dryomov 提交于 12月 24, 2013

Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b3b33b0e

18 1月, 2013 2 次提交

crush: avoid recursion if we have already collided · 7d7c1f61

由 Sage Weil 提交于 1月 15, 2013

This saves us some cycles, but does not affect the placement result at
all.

This corresponds to ceph.git commit 4abb53d4f.
Signed-off-by: NSage Weil <sage@inktank.com>

7d7c1f61

libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed · 1604f488

由 Jim Schutt 提交于 11月 30, 2012

Add libceph support for a new CRUSH tunable recently added to Ceph servers.

Consider the CRUSH rule
step chooseleaf firstn 0 type <node_type>

This rule means that <n> replicas will be chosen in a manner such that
each chosen leaf's branch will contain a unique instance of <node_type>.

When an object is re-replicated after a leaf failure, if the CRUSH map uses
a chooseleaf rule the remapped replica ends up under the <node_type> bucket
that held the failed leaf. This causes uneven data distribution across the
storage cluster, to the point that when all the leaves but one fail under a
particular <node_type> bucket, that remaining leaf holds all the data from
its failed peers.

This behavior also limits the number of peers that can participate in the
re-replication of the data held by the failed leaf, which increases the
time required to re-replicate after a failure.

For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
inner and outer descents.

If the tree descent down to <node_type> is the outer descent, and the descent
from <node_type> down to a leaf is the inner descent, the issue is that a
down leaf is detected on the inner descent, so only the inner descent is
retried.

In order to disperse re-replicated data as widely as possible across a
storage cluster after a failure, we want to retry the outer descent. So,
fix up crush_choose() to allow the inner descent to return immediately on
choosing a failed leaf. Wire this up as a new CRUSH tunable.

Note that after this change, for a chooseleaf rule, if the primary OSD
in a placement group has failed, choosing a replacement may result in
one of the other OSDs in the PG colliding with the new primary. This
requires that OSD's data for that PG to need moving as well. This
seems unavoidable but should be relatively rare.

This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NSage Weil <sage@inktank.com>

1604f488

31 7月, 2012 1 次提交

libceph: support crush tunables · 546f04ef

由 Sage Weil 提交于 7月 30, 2012

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: Ncaleb miles <caleb.miles@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

546f04ef

08 5月, 2012 5 次提交

crush: remove forcefeed functionality · 41ebcc09

由 Sage Weil 提交于 5月 07, 2012

Remove forcefeed functionality from CRUSH.  This is an ugly misfeature that
is mostly useless and unused.  Remove it.

Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

Conflicts:

	net/ceph/crush/mapper.c

41ebcc09

crush: use a temporary variable to simplify crush_do_rule · 0668216e

由 Sage Weil 提交于 5月 07, 2012

Use a temporary variable here to avoid repeated array lookups and clean up
the code a bit.

This reflects ceph.git commit 6b5be27634ad307b471a5bf0db85c4f5c834885f.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

0668216e

crush: be more tolerant of nonsensical crush maps · a1f4895b

由 Sage Weil 提交于 5月 07, 2012

If we get a map that doesn't make sense, error out or ignore the badness
instead of BUGging out.  This reflects the ceph.git commits
9895f0bff7dc68e9b49b572613d242315fb11b6c and
8ded26472058d5205803f244c2f33cb6cb10de79.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

a1f4895b

crush: adjust local retry threshold · c90f95ed

由 Sage Weil 提交于 5月 07, 2012

This small adjustment reflects a change that was made in ceph.git commit
af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago.  An N-1
search is not exhaustive.  Fixed ceph.git bug #1594.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

c90f95ed

crush: clean up types, const-ness · 8b12d47b

由 Sage Weil 提交于 5月 07, 2012

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

8b12d47b

27 4月, 2012 1 次提交

crush: include header for global symbols · feb50ac1

由 hartleys 提交于 4月 24, 2012

Include the header to pickup the definitions of the global symbols.

Quiets the following sparse warnings:

warning: symbol 'crush_find_rule' was not declared. Should it be static?
warning: symbol 'crush_do_rule' was not declared. Should it be static?
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

feb50ac1

16 4月, 2012 1 次提交

net: cleanup unsigned to unsigned int · 95c96174

由 Eric Dumazet 提交于 4月 15, 2012

Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c96174

11 1月, 2012 1 次提交
- S
  crush: fix force for non-root TAKE · e11b05d3
  由 Sage Weil 提交于 12月 12, 2011
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  e11b05d3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功