提交 · 3fb99d483e614bc3834784c7a686572c7970bb92 · openeuler / Kernel

07 9月, 2017 1 次提交

由 Yanhu Cao 提交于 7月 21, 2017

startsync is a no-op, has been for years.  Remove it.

Link: http://tracker.ceph.com/issues/20604Signed-off-by: NYanhu Cao <gmayyyha@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3fb99d48

01 8月, 2017 6 次提交

libceph: make RECOVERY_DELETES feature create a new interval · ae78dd81

由 Ilya Dryomov 提交于 7月 27, 2017

This is needed so that the OSDs can regenerate the missing set at the
start of a new interval where support for recovery deletes changed.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

ae78dd81

libceph: upmap semantic changes · f53b7665

由 Ilya Dryomov 提交于 7月 27, 2017

- apply both pg_upmap and pg_upmap_items
- allow bidirectional swap of pg-upmap-items
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f53b7665

crush: assume weight_set != null imples weight_set_size > 0 · c7ed1a4b

由 Ilya Dryomov 提交于 7月 24, 2017

Reflects ceph.git commit 5e8fa3e06b68fae1582c9230a3a8d1abc6146286.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

c7ed1a4b

libceph: fallback for when there isn't a pool-specific choose_arg · e17e8969

由 Ilya Dryomov 提交于 7月 24, 2017

There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set.  If you create a per-pool weight-set,
that works for that pool.  Otherwise we try the compat/default one.  If
that doesn't exist either, then we use the normal CRUSH weights.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

e17e8969

libceph: don't call ->reencode_message() more than once per message · 4690faf0

由 Ilya Dryomov 提交于 7月 26, 2017

Reencoding an already reencoded message is a bad idea.  This could
happen on Policy::stateful_server connections (!CEPH_MSG_CONNECT_LOSSY),
such as MDS sessions.

This didn't pop up in testing because currently only OSD requests are
reencoded and OSD sessions are always lossy.

Fixes: 98ad5ebd ("libceph: ceph_connection_operations::reencode_message() method")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

4690faf0

libceph: make encode_request_*() work with r_mempool requests · 986e8989

由 Ilya Dryomov 提交于 7月 25, 2017

Messages allocated out of ceph_msgpool have a fixed front length
(pool->front_len).  Asserting that the entire front has been filled
while encoding is thus wrong.

Fixes: 8cb441c0 ("libceph: MOSDOp v8 encoding (actual spgid + full hash)")
Reported-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

986e8989

17 7月, 2017 5 次提交

libceph: potential NULL dereference in ceph_msg_data_create() · 7c40b22f

由 Dan Carpenter 提交于 7月 17, 2017

If kmem_cache_zalloc() returns NULL then the INIT_LIST_HEAD(&data->links);
will Oops.  The callers aren't really prepared for NULL returns so it
doesn't make a lot of difference in real life.

Fixes: 5240d9f9 ("libceph: replace message data pointer with list")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7c40b22f

libceph: don't call encode_request_finish() on MOSDBackoff messages · 914902af

由 Ilya Dryomov 提交于 7月 14, 2017

encode_request_finish() is for MOSDOp messages.  Calling it on
MOSDBackoff ack-block messages corrupts them.

Fixes: a02a946d ("libceph: respect RADOS_BACKOFF backoffs")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

914902af

libceph: use alloc_pg_mapping() in __decode_pg_upmap_items() · f5cc6898

由 Ilya Dryomov 提交于 7月 07, 2017

... otherwise we die in insert_pg_mapping(), which wants pg->node to be
empty, i.e. initialized with RB_CLEAR_NODE.

Fixes: 6f428df4 ("libceph: pg_upmap[_items] infrastructure")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f5cc6898

libceph: set -EINVAL in one place in crush_decode() · c2acfd95

由 Ilya Dryomov 提交于 7月 13, 2017

No sooner than Dan had fixed this issue in commit 293dffaa
("libceph: NULL deref on crush_decode() error path"), I brought it
back.  Add a new label and set -EINVAL once, right before failing.

Fixes: 278b1d70 ("libceph: ceph_decode_skip_* helpers")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c2acfd95

libceph: NULL deref on osdmap_apply_incremental() error path · 00c8ebb3

由 Dan Carpenter 提交于 7月 13, 2017

There are hidden gotos in the ceph_decode_* macros.  We need to set the
"err" variable on these error paths otherwise we end up returning
ERR_PTR(0) which is NULL.  It causes NULL dereferences in the callers.

Fixes: 6f428df4 ("libceph: pg_upmap[_items] infrastructure")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
[idryomov@gmail.com: similar bug in osdmap_decode(), changelog tweak]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

00c8ebb3

07 7月, 2017 28 次提交

I
libceph: osd_state is 32 bits wide in luminous · 0bb05da2
由 Ilya Dryomov 提交于 6月 22, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
0bb05da2

crush: remove an obsolete comment · 9eebe45c

由 Ilya Dryomov 提交于 6月 22, 2017

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9eebe45c

crush: crush_init_workspace starts with struct crush_work · b88ed8d8

由 Ilya Dryomov 提交于 6月 22, 2017

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b88ed8d8

libceph, crush: per-pool crush_choose_arg_map for crush_do_rule() · 5cf9c4a9

由 Ilya Dryomov 提交于 6月 22, 2017

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5cf9c4a9

crush: implement weight and id overrides for straw2 · 069f3222

由 Ilya Dryomov 提交于 6月 22, 2017

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

069f3222

libceph: apply_upmap() · 1c2e7b45

由 Ilya Dryomov 提交于 6月 21, 2017

Previously, pg_to_raw_osds() didn't filter for existent OSDs because
raw_to_up_osds() would filter for "up" ("up" is predicated on "exists")
and raw_to_up_osds() was called directly after pg_to_raw_osds().  Now,
with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds()
output can affect apply_upmap().  Introduce remove_nonexistent_osds()
to deal with that.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1c2e7b45

libceph: compute actual pgid in ceph_pg_to_up_acting_osds() · 463bb8da

由 Ilya Dryomov 提交于 6月 21, 2017

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

463bb8da

libceph: pg_upmap[_items] infrastructure · 6f428df4

由 Ilya Dryomov 提交于 6月 21, 2017

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap.  (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6f428df4

libceph: ceph_decode_skip_* helpers · 278b1d70

由 Ilya Dryomov 提交于 6月 21, 2017

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

278b1d70

libceph: kill __{insert,lookup,remove}_pg_mapping() · ab75144b

由 Ilya Dryomov 提交于 6月 21, 2017

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ab75144b

I
libceph: introduce and switch to decode_pg_mapping() · a303bb0e
由 Ilya Dryomov 提交于 6月 21, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a303bb0e

libceph: don't pass pgid by value · 33333d10

由 Ilya Dryomov 提交于 6月 21, 2017

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

33333d10

I
libceph: respect RADOS_BACKOFF backoffs · a02a946d
由 Ilya Dryomov 提交于 6月 19, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a02a946d
I
libceph: avoid unnecessary pi lookups in calc_target() · df28152d
由 Ilya Dryomov 提交于 6月 15, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
df28152d

libceph: use target pi for calc_target() calculations · 6d637a54