- 07 7月, 2017 40 次提交
-
-
由 Ilya Dryomov 提交于
Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
It is not just a pointer to crush_work, it is the whole structure. That is not a problem since it only contains a pointer. But it will be a problem if new data members are added to crush_work. Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
If there is no crush_choose_arg_map for a given pool, a NULL pointer is passed to preserve existing crush_do_rule() behavior. Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b, dbe36e08be00c6519a8c89718dd47b0219c20516. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
bucket_straw2_choose needs to use weights that may be different from weight_items. For instance to compensate for an uneven distribution caused by a low number of values. Or to fix the probability biais introduced by conditional probabilities (see http://tracker.ceph.com/issues/15653 for more information). We introduce a weight_set for each straw2 bucket to set the desired weight for a given item at a given position. The weight of a given item when picking the first replica (first position) may be different from the weight the second replica (second position). For instance the weight matrix for a given bucket containing items 3, 7 and 13 could be as follows: position 0 position 1 item 3 0x10000 0x100000 item 7 0x40000 0x10000 item 13 0x40000 0x10000 When crush_do_rule picks the first of two replicas (position 0), item 7, 3 are four times more likely to be choosen by bucket_straw2_choose than item 13. When choosing the second replica (position 1), item 3 is ten times more likely to be choosen than item 7, 13. By default the weight_set of each bucket exactly matches the content of item_weights for each position to ensure backward compatibility. bucket_straw2_choose compares items by using their id. The same ids are also used to index buckets and they must be unique. For each item in a bucket an array of ids can be provided for placement purposes and they are used instead of the ids. If no replacement ids are provided, the legacy behavior is preserved. Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Previously, pg_to_raw_osds() didn't filter for existent OSDs because raw_to_up_osds() would filter for "up" ("up" is predicated on "exists") and raw_to_up_osds() was called directly after pg_to_raw_osds(). Now, with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds() output can affect apply_upmap(). Introduce remove_nonexistent_osds() to deal with that. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Move raw_pg_to_pg() call out of get_temp_osds() and into ceph_pg_to_up_acting_osds(), for upcoming apply_upmap(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
pg_temp and pg_upmap encodings are the same (PG -> array of osds), except for the incremental remove: it's an empty mapping in new_pg_temp for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't looked at as an example for pg_upmap encoding.) Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap. __decode_pg_temp() stores into pg_temp union member, but since pg_upmap union member is identical, reading through pg_upmap later is OK. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Some of these won't be as efficient as they could be (e.g. ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32) once instead of advancing by sizeof(u32) len times), but that's fine and not worth a bunch of extra macro code. Replace skip_name_map() with ceph_decode_skip_map as an example. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping counterparts: take const struct ceph_pg *. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id, compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare() respectively. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
For luminous and beyond we are encoding the actual spgid, which requires operating with the correct pg_num, i.e. that of the target pool. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
need_check_tiering logic doesn't make a whole lot of sense. Drop it and apply tiering unconditionally on every calc_target() call instead. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
When processing a map update consisting of multiple incrementals, we may end up running check_linger_pool_dne() on a lingering request that was previously added to need_resend_linger list. If it is concluded that the target pool doesn't exist, the request is killed off while still on need_resend_linger list, which leads to a crash on a NULL lreq->osd in kick_requests(): libceph: linger_id 18446462598732840961 pool does not exist BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: ceph_osdc_handle_map+0x4ae/0x870 Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Replace it with more fine-grained bools to separate updating ceph_osd_request_target fields and the decision to resend. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Give upper layers a chance to reencode the message after the connection is negotiated and ->peer_features is set. OSD client will use this to support both luminous and pre-luminous OSDs (in a single cluster): the former need MOSDOp v8; the latter will continue to be sent MOSDOp v4. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
The old (v15) pi->last_force_request_resend has been repurposed to make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do obey pi->last_force_request_resend resend on splits. See ceph.git commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend ops on split"). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Only MON_STATEFUL_SUB, really. MON_ROUTE_OSDMAP and OSDSUBOP_NO_SNAPCONTEXT are irrelevant. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
The code has been in place since commit 63244fa1 ("libceph: introduce ceph_osd_request_target, calc_target()"), and, with the ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now in working order. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Current code does not update ceph_dentry_info::lease_session once it is set. If auth mds of corresponding dentry changes, dentry lease keeps in an invalid state. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Current ceph uses FSID as primary index key of fscache data. This allows ceph to retain cached data across remount. But this causes problem (kernel opps, fscache does not support sharing data) when a filesystem get mounted several times (with fscache enabled, with different mount options). The fix is adding a new mount option, which specifies uniquifier for fscache. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
extra_mon_dispatch() and debugfs' foo_show functions dereference fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch and debugfs before destroying fsc->mds. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Previously we were returning values for quota, layout xattrs without any kind of update -- the user just got whatever happened to be in our cache. Clearly this extra round trip has a cost, but reads of these xattrs are fairly rare, happening on admin intervention rather than in normal operation. Link: http://tracker.ceph.com/issues/17939Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Don't re-send interrupted flock request in cases of mds failover and receiving request forward. Because corresponding 'lock intr' request may have been finished, it won't get re-sent. Link: http://tracker.ceph.com/issues/20170Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Ceph needs to flush dirty page in the order in which in which snap context they belong to. Dirty pages belong to older snap context should be flushed earlier. if writepage_nounlock() can not flush a page, it should redirty the page. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Callers of writepage_nounlock() have already ensured non-null page->mapping. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-