提交 · 205ee117d4dc4a11ac3bd9638bb9b2e839f4de9a · openeuler / Kernel

25 6月, 2015 1 次提交

crush: sync up with userspace · b459be73

由 Ilya Dryomov 提交于 9年前

.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
<xerofoify@gmail.com> is folded in as crush_ln() has been made static
in userspace as well.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b459be73

22 4月, 2015 1 次提交

crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765

由 Ilya Dryomov 提交于 9年前

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

958a2765

05 4月, 2014 2 次提交

crush: add SET_CHOOSELEAF_VARY_R step · d83ed858

由 Ilya Dryomov 提交于 10年前

This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

d83ed858

crush: add chooseleaf_vary_r tunable · e2b149cc

由 Ilya Dryomov 提交于 10年前

The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e2b149cc

01 1月, 2014 10 次提交

crush: add set_choose_local_[fallback_]tries steps · f046bf92

由 Ilya Dryomov 提交于 11年前

This allows all of the tunables to be overridden by a specific rule.

Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292,
                          0497db49e5973b50df26251ed0e3f4ac7578e66e.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f046bf92

crush: CHOOSE_LEAF -> CHOOSELEAF throughout · 917edad5

由 Ilya Dryomov 提交于 11年前

This aligns the internal identifier names with the user-visible names in
the decompiled crush map language.

Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

917edad5

crush: add SET_CHOOSE_TRIES rule step · cc10df4a

由 Ilya Dryomov 提交于 11年前

Since we can specify the recursive retries in a rule, we may as well also
specify the non-recursive tries too for completeness.

Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

cc10df4a

crush: apply chooseleaf_tries to firstn mode too · f18650ac

由 Ilya Dryomov 提交于 11年前

Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well. Note that we have
slightly different behavior here than with indep:

If the firstn value is not specified for firstn, we pass through the
normal attempt count. This maintains compatibility with legacy behavior.
Note that this is usually *not* actually N^2 work, though, because of the
descend_once tunable. However, descend_once is unfortunately *not* the
same thing as 1 chooseleaf try because it is only checked on a reject but
not on a collision. Sigh.

In contrast, for indep, if tries is not specified we default to 1
recursive attempt, because that is simply more sane, and we have the
option to do so. The descend_once tunable has no effect for indep.

Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

f18650ac

crush: new SET_CHOOSE_LEAF_TRIES command · be3226ac

由 Ilya Dryomov 提交于 11年前

Explicitly control the number of sample attempts, and allow the number of
tries in the recursive call to be explicitly controlled via the rule. This
is important because the amount of time we want to spend looking for a
solution may be rule dependent (e.g., higher for the wide indep pool than
the rep pools).

(We should do the same for the other tunables, by the way!)

Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

be3226ac

crush: use breadth-first search for indep mode · 9a3b490a

由 Ilya Dryomov 提交于 11年前

Reflects ceph.git commit 86e978036a4ecbac4c875e7c00f6c5bbe37282d3.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

9a3b490a

crush: return CRUSH_ITEM_UNDEF for failed placements with indep · c6d98a60

由 Ilya Dryomov 提交于 11年前

For firstn mode, if we fail to make a valid placement choice, we just
continue and return a short result to the caller. For indep mode, however,
we need to make the position stable, and return an undefined value on
failed placements to avoid shifting later results to the left.

Reflects ceph.git commit b1d4dd4eb044875874a1d01c01c7d766db5d0a80.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c6d98a60

crush: eliminate CRUSH_MAX_SET result size limitation · e8ef19c4

由 Ilya Dryomov 提交于 11年前

This is only present to size the temporary scratch arrays that we put on
the stack.  Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e8ef19c4

crush: factor out (trivial) crush_destroy_rule() · bfb16d7d

由 Ilya Dryomov 提交于 11年前

Reflects ceph.git commit 43a01c9973c4b83f2eaa98be87429941a227ddde.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bfb16d7d

crush: pass weight vector size to map function · b3b33b0e

由 Ilya Dryomov 提交于 11年前

Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b3b33b0e

18 1月, 2013 1 次提交

libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed · 1604f488

由 Jim Schutt 提交于 12年前

Add libceph support for a new CRUSH tunable recently added to Ceph servers.

Consider the CRUSH rule
step chooseleaf firstn 0 type <node_type>

This rule means that <n> replicas will be chosen in a manner such that
each chosen leaf's branch will contain a unique instance of <node_type>.

When an object is re-replicated after a leaf failure, if the CRUSH map uses
a chooseleaf rule the remapped replica ends up under the <node_type> bucket
that held the failed leaf. This causes uneven data distribution across the
storage cluster, to the point that when all the leaves but one fail under a
particular <node_type> bucket, that remaining leaf holds all the data from
its failed peers.

This behavior also limits the number of peers that can participate in the
re-replication of the data held by the failed leaf, which increases the
time required to re-replicate after a failure.

For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
inner and outer descents.

If the tree descent down to <node_type> is the outer descent, and the descent
from <node_type> down to a leaf is the inner descent, the issue is that a
down leaf is detected on the inner descent, so only the inner descent is
retried.

In order to disperse re-replicated data as widely as possible across a
storage cluster after a failure, we want to retry the outer descent. So,
fix up crush_choose() to allow the inner descent to return immediately on
choosing a failed leaf. Wire this up as a new CRUSH tunable.

Note that after this change, for a chooseleaf rule, if the primary OSD
in a placement group has failed, choosing a replacement may result in
one of the other OSDs in the PG colliding with the new primary. This
requires that OSD's data for that PG to need moving as well. This
seems unavoidable but should be relatively rare.

This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NSage Weil <sage@inktank.com>

1604f488

03 10月, 2012 1 次提交

UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers · a1ce3928

由 David Howells 提交于 12年前

Convert #include "..." to #include <path/...> in kernel system headers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

a1ce3928

31 7月, 2012 1 次提交

libceph: support crush tunables · 546f04ef

由 Sage Weil 提交于 12年前

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: Ncaleb miles <caleb.miles@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

546f04ef

08 5月, 2012 4 次提交

crush: fix tree node weight lookup · f671d4cd

由 Sage Weil 提交于 12年前

Fix the node weight lookup for tree buckets by using a correct accessor.

Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

f671d4cd

crush: remove parent maps · fc7c3ae5

由 Sage Weil 提交于 12年前

These were used for the ill-fated forcefeed feature.  Remove them.

Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

fc7c3ae5

crush: remove forcefeed functionality · 41ebcc09

由 Sage Weil 提交于 12年前

Remove forcefeed functionality from CRUSH.  This is an ugly misfeature that
is mostly useless and unused.  Remove it.

Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

Conflicts:

	net/ceph/crush/mapper.c

41ebcc09

crush: clean up types, const-ness · 8b12d47b

由 Sage Weil 提交于 12年前

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

8b12d47b

21 10月, 2010 1 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 14年前

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功