提交 · d50c97b566c5bbf990eff472e9feaa58fdebdd33 · openeuler / Kernel

25 6月, 2015 2 次提交
- Y
  libceph: allow setting osd_req_op's flags · 144cba14
  由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
  144cba14
- Y
  libceph: properly release STAT request's raw_data_in · 66ba609f
  由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
  66ba609f
21 5月, 2015 2 次提交

Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · 521a04d0

由 Ilya Dryomov 提交于 5月 11, 2015

This reverts commit ba9d114e.

.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies.  In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().

The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.

Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

521a04d0

libceph: request a new osdmap if lingering request maps to no osd · b0494532

由 Ilya Dryomov 提交于 5月 11, 2015

This commit does two things.  First, if there are any homeless
lingering requests, we now request a new osdmap even if the osdmap that
is being processed brought no changes, i.e. if a given lingering
request turned homeless in one of the previous epochs and remained
homeless in the current epoch.  Not doing so leaves us with a stale
osdmap and as a result we may miss our window for reestablishing the
watch and lose notifies.

MON=1 OSD=1:

    # cat linger-needmap.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    rbd map dne/dne # obtain a new osdmap as a side effect (!)
    sleep 1
    ceph osd in 0
    rbd resize --size 2 test
    # rbd info test | grep size -> 2M
    # blockdev --getsize $DEV -> 1M

N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
above is enough to make it miss that resize notify, but that is a
bug^Wlimitation of ceph watch/notify v1.

Second, homeless lingering requests are now kicked just like those
lingering requests whose mapping has changed.  This is mainly to
recognize that a homeless lingering request makes no sense and to
preserve the invariant that a registered lingering request is not
sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
which commit ba9d114e ("libceph: clear r_req_lru_item in
__unregister_linger_request()") tried to fix the _wrong_ way.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b0494532

19 2月, 2015 2 次提交

libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3

由 Ilya Dryomov 提交于 2月 16, 2015

a255651d ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b28ec2f3

libceph: fix double __remove_osd() problem · 7eb71e03

由 Ilya Dryomov 提交于 2月 17, 2015

It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            <osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
  osd_reset()
                              <osdmap - osd3 down>
                              ceph_osdc_handle_map()
                                <takes map_sem>
                                kick_requests()
                                  <takes request_mutex>
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  <releases request_mutex>
                                <releases map_sem>
    <takes map_sem>
    <takes request_mutex>
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() <-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7eb71e03

18 12月, 2014 4 次提交

libceph: specify position of extent operation · 715e4cd4

由 Yan, Zheng 提交于 11月 13, 2014

allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

715e4cd4

libceph: add CREATE osd operation support · 864e9197

由 Yan, Zheng 提交于 11月 13, 2014

Add CEPH_OSD_OP_CREATE support.  Also change libceph to not treat
CEPH_OSD_OP_DELETE as an extent op and add an assert to that end.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

864e9197

Y
libceph: add SETXATTR/CMPXATTR osd operations support · d74b50be
由 Yan, Zheng 提交于 11月 12, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
```
d74b50be
Y
libceph: message signature support · 33d07337
由 Yan, Zheng 提交于 11月 04, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
33d07337

14 11月, 2014 3 次提交

libceph: change from BUG to WARN for __remove_osd() asserts · cc9f1f51

由 Ilya Dryomov 提交于 11月 05, 2014

No reason to use BUG_ON for osd request list assertions.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

cc9f1f51

libceph: clear r_req_lru_item in __unregister_linger_request() · ba9d114e

由 Ilya Dryomov 提交于 11月 05, 2014

kick_requests() can put linger requests on the notarget list.  This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

ba9d114e

libceph: unlink from o_linger_requests when clearing r_osd · a390de02

由 Ilya Dryomov 提交于 11月 04, 2014

Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a390de02

15 10月, 2014 5 次提交

libceph: sync osd op definitions in rados.h · 70b5bfa3

由 Ilya Dryomov 提交于 10月 02, 2014

Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

70b5bfa3

libceph: don't try checking queue_work() return value · 91883cd2

由 Ilya Dryomov 提交于 9月 11, 2014

queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it.  So don't bother at all.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

91883cd2

libceph: Convert pr_warning to pr_warn · b9a67899

由 Joe Perches 提交于 9月 09, 2014

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

b9a67899

libceph: resend lingering requests with a new tid · 2cc6128a

由 Ilya Dryomov 提交于 9月 03, 2014

Both not yet registered (r_linger && list_empty(&r_linger_item)) and
registered linger requests should use the new tid on resend to avoid
the dup op detection logic on the OSDs, yet we were doing this only for
"registered" case. Factor out and simplify the "registered" logic and
use the new helper for "not registered" case as well.

Fixes: http://tracker.ceph.com/issues/8806Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

2cc6128a

libceph: abstract out ceph_osd_request enqueue logic · f671b581

由 Ilya Dryomov 提交于 9月 02, 2014

Introduce __enqueue_request() and switch to it.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f671b581

08 7月, 2014 9 次提交

libceph: nuke ceph_osdc_unregister_linger_request() · 2d05f082

由 Ilya Dryomov 提交于 6月 24, 2014

Remove now unused ceph_osdc_unregister_linger_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

2d05f082

libceph: introduce ceph_osdc_cancel_request() · c9f9b93d

由 Ilya Dryomov 提交于 6月 19, 2014

Introduce ceph_osdc_cancel_request() intended for canceling requests
from the higher layers (rbd and cephfs).  Because higher layers are in
charge and are supposed to know what and when they are canceling, the
request is not completed, only unref'ed and removed from the libceph
data structures.

__cancel_request() is no longer called before __unregister_request(),
because __unregister_request() unconditionally revokes r_request and
there is no point in trying to do it twice.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c9f9b93d

libceph: fix linger request check in __unregister_request() · 4f23409e

由 Ilya Dryomov 提交于 6月 20, 2014

We should check if request is on the linger request list of any of the
OSDs, not whether request is registered or not.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

4f23409e

libceph: unregister only registered linger requests · af593064

由 Ilya Dryomov 提交于 6月 20, 2014

Linger requests that have not yet been registered should not be
unregistered by __unregister_linger_request().  This messes up ref
count and leads to use-after-free.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

af593064

libceph: assert both regular and lingering lists in __remove_osd() · 7c6e6fc5

由 Ilya Dryomov 提交于 6月 18, 2014

It is important that both regular and lingering requests lists are
empty when the OSD is removed.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7c6e6fc5

libceph: harden ceph_osdc_request_release() a bit · 6562d661

由 Ilya Dryomov 提交于 6月 20, 2014

Add some WARN_ONs to alert us when we try to destroy requests that are
still registered.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

6562d661

libceph: move and add dout()s to ceph_osdc_request_{get,put}() · 9e94af20

由 Ilya Dryomov 提交于 6月 20, 2014

Add dout()s to ceph_osdc_request_{get,put}().  Also move them to .c and
turn kref release callback into a static function.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

9e94af20

libceph: add maybe_move_osd_to_lru() and switch to it · bbf37ec3

由 Ilya Dryomov 提交于 6月 20, 2014

Abstract out __move_osd_to_lru() logic from __unregister_request() and
__unregister_linger_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

bbf37ec3

libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item · 1d0326b1

由 Ilya Dryomov 提交于 6月 20, 2014

So that:

req->r_osd_item --> osd->o_requests list
req->r_linger_osd_item --> osd->o_linger_requests list
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

1d0326b1

12 6月, 2014 1 次提交

ceph: remove bogus extern · f6479449

由 stephen hemminger 提交于 6月 10, 2014

Sparse complained about this bogus extern on definition of
a function.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6479449

05 4月, 2014 2 次提交

libceph: return primary from ceph_calc_pg_acting() · 8008ab10

由 Ilya Dryomov 提交于 3月 24, 2014

In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly.  Primary is now specified separately
from the order of osds in the set.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

8008ab10

libceph: split osdmap allocation and decode steps · a2505d63

由 Ilya Dryomov 提交于 3月 13, 2014

Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a2505d63

03 4月, 2014 2 次提交

libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op · c647b8a8

由 Ilya Dryomov 提交于 2月 25, 2014

This is primarily for rbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c647b8a8

libceph: encode CEPH_OSD_OP_FLAG_* op flags · 7b25bf5f

由 Ilya Dryomov 提交于 2月 25, 2014

Encode ceph_osd_op::flags field so that it gets sent over the wire.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7b25bf5f

14 2月, 2014 1 次提交

net: remove unnecessary return's · 2045ceae

由 stephen hemminger 提交于 2月 12, 2014

One of my pet coding style peeves is the practice of
adding extra return; at the end of function.
Kill several instances of this in network code.

I suppose some coccinelle wizardy could do this automatically.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2045ceae

08 2月, 2014 2 次提交

libceph: take map_sem for read in handle_reply() · ff513ace

由 Ilya Dryomov 提交于 2月 03, 2014

Handling redirect replies requires both map_sem and request_mutex.
Taking map_sem unconditionally near the top of handle_reply() avoids
possible race conditions that arise from releasing request_mutex to be
able to acquire map_sem in redirect reply case.  (Lock ordering is:
map_sem, request_mutex, crush_mutex.)
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ff513ace

libceph: factor out logic from ceph_osdc_start_request() · 0bbfdfe8

由 Ilya Dryomov 提交于 1月 31, 2014

Factor out logic from ceph_osdc_start_request() into a new helper,
__ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
taking locks and calling __ceph_osdc_start_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0bbfdfe8

04 2月, 2014 1 次提交

libceph: fix error handling in ceph_osdc_init() · c172ec5c

由 Ilya Dryomov 提交于 1月 31, 2014

msgpool_op_reply message pool isn't destroyed if workqueue construction
fails.  Fix it.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c172ec5c

28 1月, 2014 4 次提交

libceph: follow redirect replies from osds · 205ee118

由 Ilya Dryomov 提交于 1月 27, 2014

Follow redirect replies from osds, for details see ceph.git commit
fbbe3ad1220799b7bb00ea30fce581c5eadaf034.

v1 (current) version of redirect reply consists of oloc and oid, which
expands to pool, key, nspace, hash and oid.  However, server-side code
that would populate anything other than pool doesn't exist yet, and
hence this commit adds support for pool redirects only.  To make sure
that future server-side updates don't break us, we decode all fields
and, if any of key, nspace, hash or oid have a non-default value, error
out with "corrupt osd_op_reply ..." message.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

205ee118

libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} · 3c972c95

由 Ilya Dryomov 提交于 1月 27, 2014

Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before
introducing r_target_{oloc,oid} needed for redirects.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

3c972c95

libceph: follow {read,write}_tier fields on osd request submission · 17a13e40

由 Ilya Dryomov 提交于 1月 27, 2014

Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and
write_tier for write and read+write ops (aka basic tiering support).
{read,write}_tier are part of pg_pool_t since v9.  This commit bumps
our pg_pool_t decode compat version from v7 to v9, all new fields
except for {read,write}_tier are ignored.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

17a13e40

libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg() · 7c13cb64

由 Ilya Dryomov 提交于 1月 27, 2014

Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

7c13cb64

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功