提交 · a10bcb19ae02cea7d5e6650fbc2de3ced46b4e5d · openanolis / cloud-kernel

07 7月, 2017 9 次提交

libceph: resend on PG splits if OSD has RESEND_ON_SPLIT · 7de030d6

由 Ilya Dryomov 提交于 6月 15, 2017

Note that ceph_osd_request_target fields are updated regardless of
RESEND_ON_SPLIT.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7de030d6

I
libceph: MOSDOp v8 encoding (actual spgid + full hash) · 8cb441c0
由 Ilya Dryomov 提交于 6月 15, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
8cb441c0

libceph: ceph_connection_operations::reencode_message() method · 98ad5ebd

由 Ilya Dryomov 提交于 6月 15, 2017

Give upper layers a chance to reencode the message after the connection
is negotiated and ->peer_features is set. OSD client will use this to
support both luminous and pre-luminous OSDs (in a single cluster): the
former need MOSDOp v8; the latter will continue to be sent MOSDOp v4.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

98ad5ebd

libceph: introduce ceph_spg, ceph_pg_to_primary_shard() · dc98ff72

由 Ilya Dryomov 提交于 6月 15, 2017

Store both raw pgid and actual spgid in ceph_osd_request_target.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dc98ff72

I
libceph: fold [l]req->last_force_resend into ceph_osd_request_target · dc93e0e2
由 Ilya Dryomov 提交于 6月 05, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
dc93e0e2

libceph: support SERVER_JEWEL feature bits · 220abf5a

由 Ilya Dryomov 提交于 6月 05, 2017

Only MON_STATEFUL_SUB, really.  MON_ROUTE_OSDMAP and
OSDSUBOP_NO_SNAPCONTEXT are irrelevant.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

220abf5a

libceph: advertise support for OSD_POOLRESEND · 2d7522e0

由 Ilya Dryomov 提交于 6月 05, 2017

The code has been in place since commit 63244fa1 ("libceph:
introduce ceph_osd_request_target, calc_target()"), and, with the
ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now
in working order.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2d7522e0

I
libceph: new features macros · f179d3ba
由 Ilya Dryomov 提交于 6月 05, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
f179d3ba

libceph: remove ceph_sanitize_features() workaround · dcbbd97c

由 Ilya Dryomov 提交于 6月 05, 2017

Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dcbbd97c

24 5月, 2017 1 次提交
- I
  libceph: use kbasename() and kill ceph_file_part() · 6f4dbd14
  由 Ilya Dryomov 提交于 5月 19, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
  6f4dbd14
04 5月, 2017 11 次提交

ceph: fix file open flags on ppc64 · f775ff7d

由 Alexander Graf 提交于 4月 27, 2017

The file open flags (O_foo) are platform specific and should never go
out to an interface that is not local to the system.

Unfortunately these flags have leaked out onto the wire in the cephfs
implementation. That lead to bogus flags getting transmitted on ppc64.

This patch converts the kernel view of flags to the ceph view of file
open flags.

Fixes: 124e68e7 ("ceph: file operations")
Signed-off-by: NAlexander Graf <agraf@suse.de>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f775ff7d

rbd: support updating the lock cookie without releasing the lock · 14bb211d

由 Ilya Dryomov 提交于 4月 13, 2017

As we no longer release the lock before potentially raising BLACKLISTED
in rbd_reregister_watch(), the "either locked or blacklisted" assert in
rbd_queue_workfn() needs to go: we can be both locked and blacklisted
at that point now.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

14bb211d

libceph: add an epoch_barrier field to struct ceph_osd_client · 58eb7932

由 Jeff Layton 提交于 4月 18, 2017

Cephfs can get cap update requests that contain a new epoch barrier in
them. When that happens we want to pause all OSD traffic until the right
map epoch arrives.

Add an epoch_barrier field to ceph_osd_client that is protected by the
osdc->lock rwsem. When the barrier is set, and the current OSD map
epoch is below that, pause the request target when submitting the
request or when revisiting it. Add a way for upper layers (cephfs)
to update the epoch_barrier as well.

If we get a new map, compare the new epoch against the barrier before
kicking requests and request another map if the map epoch is still lower
than the one we want.

If we get a map with a full pool, or at quota condition, then set the
barrier to the current epoch value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

58eb7932

libceph: allow requests to return immediately on full conditions if caller wishes · a1f4020a

由 Jeff Layton 提交于 4月 04, 2017

Usually, when the osd map is flagged as full or the pool is at quota,
write requests just hang. This is not what we want for cephfs, where
it would be better to simply report -ENOSPC back to userland instead
of stalling.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior.

Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller),
and on any other write request from ceph.ko.

A later patch will deal with requests that were submitted before the new
map showing the full condition came in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a1f4020a

libceph: remove req->r_replay_version · aa26d662

由 Jeff Layton 提交于 4月 04, 2017

Nothing uses this anymore with the removal of the ack vs. commit code.
Remove the field and just encode zeroes into place in the request
encoding.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

aa26d662

ceph: make seeky readdir more efficient · 79162547

由 Yan, Zheng 提交于 4月 05, 2017

Current cephfs client uses string to indicate start position of
readdir. The string is last entry of previous readdir reply.
This approach does not work for seeky readdir because we can
not easily convert the new postion to a string. For seeky readdir,
mds needs to return dentries from the beginning. Client keeps
retrying if the reply does not contain the dentry it wants.

In current version of ceph, mds sorts CDentry in its cache in
hash order. Client also uses dentry hash to compose dir postion.
For seeky readdir, if client passes the hash part of dir postion
to mds. mds can avoid replying useless dentries.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

79162547

ceph: allow connecting to mds whose rank >= mdsmap::m_max_mds · 76201b63

由 Yan, Zheng 提交于 3月 28, 2017

mdsmap::m_max_mds is the expected count of active mds. It's not the
max rank of active mds. User can decrease mdsmap::m_max_mds, but does
not stop mds whose rank >= mdsmap::m_max_mds.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

76201b63

libceph: convert ceph_pagelist.refcnt from atomic_t to refcount_t · 0e1a5ee6

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e1a5ee6

libceph: convert ceph_osd.o_ref from atomic_t to refcount_t · 02113a0f

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

02113a0f

libceph: convert ceph_snap_context.nref from atomic_t to refcount_t · 06dfa963

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

06dfa963

libceph, ceph: always advertise all supported features · 74da4a0f

由 Ilya Dryomov 提交于 3月 03, 2017

No reason to hide CephFS-specific features in the rbd case.  Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

74da4a0f

07 3月, 2017 1 次提交

libceph: osd_request_timeout option · 7cc5e38f

由 Ilya Dryomov 提交于 2月 12, 2017

osd_request_timeout specifies how many seconds to wait for a response
from OSDs before returning -ETIMEDOUT from an OSD request.  0 (default)
means no limit.

osd_request_timeout is osdkeepalive-precise -- in-flight requests are
swept through every osdkeepalive seconds.  With ack vs commit behaviour
gone, abort_request() is really simple.

This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>.
Tested-by: NArtur Molchanov <artur.molchanov@synesis.ru>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

7cc5e38f

25 2月, 2017 1 次提交

libceph: get rid of ack vs commit · b18b9550

由 Ilya Dryomov 提交于 2月 11, 2017

- CEPH_OSD_FLAG_ACK shouldn't be set anymore, so assert on it
- remove support for handling ack replies (OSDs will send ack replies
  only if clients request them)
- drop the "do lingering callbacks under osd->lock" logic from
  handle_reply() -- lreq->lock is sufficient in all three cases
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b18b9550

20 2月, 2017 5 次提交

I
rbd: kill obj_request->object_name and rbd_segment_name_cache · 6c696d85
由 Ilya Dryomov 提交于 1月 25, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>
```
6c696d85

libceph: bump CEPH_PG_MAX_SIZE to 32 · 083a51fb

由 Ilya Dryomov 提交于 2月 09, 2017

... to accommodate potentially very wide EC pools.  This increases the
size of a typical rbd ceph_osd_request by ~12% (from 1040 to 1168 bytes),
but I'd rather go future proof here.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

083a51fb

crush: merge working data and scratch · 743efcff

由 Ilya Dryomov 提交于 1月 31, 2017

Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

743efcff

crush: remove mutable part of CRUSH map · 66a0e2d5

由 Ilya Dryomov 提交于 1月 31, 2017

Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

66a0e2d5

libceph: use BUG() instead of BUG_ON(1) · d24cdcd3

由 Arnd Bergmann 提交于 1月 16, 2017

I ran into this compile warning, which is the result of BUG_ON(1)
not always leading to the compiler treating the code path as
unreachable:

    include/linux/ceph/osdmap.h: In function 'ceph_can_shift_osds':
    include/linux/ceph/osdmap.h:62:1: error: control reaches end of non-void function [-Werror=return-type]

Using BUG() here avoids the warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d24cdcd3

15 12月, 2016 1 次提交

libceph: always signal completion when done · c297eb42

由 Ilya Dryomov 提交于 12月 02, 2016

r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested. It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.

However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set. An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.

We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case. Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error. This is a bit cleaner and helps with the cancellation code.
Reported-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c297eb42

13 12月, 2016 3 次提交

ceph: add flags parameter to send_cap_msg · 1e4ef0c6

由 Jeff Layton 提交于 11月 10, 2016

Add a flags parameter to send_cap_msg, so we can request expedited
service from the MDS when we know we'll be waiting on the result.

Set that flag in the case of try_flush_caps. The callers of that
function generally wait synchronously on the result, so it's beneficial
to ask the server to expedite it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

1e4ef0c6

Y
ceph: check availability of mds cluster on mount · e9e427f0
由 Yan, Zheng 提交于 11月 10, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
e9e427f0

libceph: drop len argument of *verify_authorizer_reply() · 0dde5848

由 Ilya Dryomov 提交于 12月 02, 2016

The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

0dde5848

11 11月, 2016 1 次提交

libceph: initialize last_linger_id with a large integer · 264048af

由 Ilya Dryomov 提交于 11月 08, 2016

osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies.  Starting with a large integer should ease the task
of telling apart kernel and userspace clients.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

264048af

01 11月, 2016 1 次提交

ceph: don't include blk_types.h in messenger.h · 9f082171

由 Christoph Hellwig 提交于 11月 01, 2016

The file only needs the struct bvec_iter delcaration, which is available
from bvec.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9f082171

03 10月, 2016 1 次提交
- Y
  ceph: handle CEPH_SESSION_REJECT message · fcff415c
  由 Yan, Zheng 提交于 9月 14, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  fcff415c
25 8月, 2016 5 次提交

rbd: add 'client_addr' sysfs rbd device attribute · 005a07bf

由 Ilya Dryomov 提交于 8月 18, 2016

Export client addr/nonce, so userspace can check if a image is being
blacklisted.
Signed-off-by: NMike Christie <mchristi@redhat.com>
[idryomov@gmail.com: ceph_client_addr(), endianess fix]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

005a07bf

libceph: rename ceph_client_id() -> ceph_client_gid() · 033268a5

由 Ilya Dryomov 提交于 8月 12, 2016

It's gid / global_id in other places.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

033268a5

libceph: support for blacklisting clients · 6305a3b4

由 Douglas Fuller 提交于 7月 22, 2015

Reuse ceph_mon_generic_request infrastructure for sending monitor
commands.  In particular, add support for 'blacklist add' to prevent
other, non-responsive clients from making further updates.
Signed-off-by: NDouglas Fuller <dfuller@redhat.com>
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

6305a3b4

libceph: support for lock.lock_info · d4ed4a53

由 Douglas Fuller 提交于 6月 29, 2015

Add an interface for the Ceph OSD lock.lock_info method and associated
data structures.

Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: NDouglas Fuller <dfuller@redhat.com>
[idryomov@gmail.com: refactor, misc fixes throughout]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d4ed4a53

libceph: support for advisory locking on RADOS objects · f66241cb

由 Douglas Fuller 提交于 6月 18, 2015

This patch adds support for rados lock, unlock and break lock.

Based heavily on code by Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: NDouglas Fuller <dfuller@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f66241cb

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功