提交 · 9eebe45c091e2dff22d4bd87360a624303148ed1 · openeuler / Kernel

07 7月, 2017 40 次提交

crush: remove an obsolete comment · 9eebe45c

由 Ilya Dryomov 提交于 6月 22, 2017

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9eebe45c

crush: crush_init_workspace starts with struct crush_work · b88ed8d8

由 Ilya Dryomov 提交于 6月 22, 2017

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b88ed8d8

libceph, crush: per-pool crush_choose_arg_map for crush_do_rule() · 5cf9c4a9

由 Ilya Dryomov 提交于 6月 22, 2017

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5cf9c4a9

crush: implement weight and id overrides for straw2 · 069f3222

由 Ilya Dryomov 提交于 6月 22, 2017

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

069f3222

libceph: apply_upmap() · 1c2e7b45

由 Ilya Dryomov 提交于 6月 21, 2017

Previously, pg_to_raw_osds() didn't filter for existent OSDs because
raw_to_up_osds() would filter for "up" ("up" is predicated on "exists")
and raw_to_up_osds() was called directly after pg_to_raw_osds().  Now,
with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds()
output can affect apply_upmap().  Introduce remove_nonexistent_osds()
to deal with that.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1c2e7b45

libceph: compute actual pgid in ceph_pg_to_up_acting_osds() · 463bb8da

由 Ilya Dryomov 提交于 6月 21, 2017

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

463bb8da

libceph: pg_upmap[_items] infrastructure · 6f428df4

由 Ilya Dryomov 提交于 6月 21, 2017

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap.  (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6f428df4

libceph: ceph_decode_skip_* helpers · 278b1d70

由 Ilya Dryomov 提交于 6月 21, 2017

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

278b1d70

libceph: kill __{insert,lookup,remove}_pg_mapping() · ab75144b

由 Ilya Dryomov 提交于 6月 21, 2017

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ab75144b

I
libceph: introduce and switch to decode_pg_mapping() · a303bb0e
由 Ilya Dryomov 提交于 6月 21, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a303bb0e

libceph: don't pass pgid by value · 33333d10

由 Ilya Dryomov 提交于 6月 21, 2017

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

33333d10

I
libceph: respect RADOS_BACKOFF backoffs · a02a946d
由 Ilya Dryomov 提交于 6月 19, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a02a946d

libceph: make DEFINE_RB_* helpers more general · 76f827a7

由 Ilya Dryomov 提交于 6月 19, 2017

Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id,
compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare()
respectively.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

76f827a7

I
libceph: avoid unnecessary pi lookups in calc_target() · df28152d
由 Ilya Dryomov 提交于 6月 15, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
df28152d

libceph: use target pi for calc_target() calculations · 6d637a54