提交 · 8a703a383dd3458753e0ad71860ed3a5097692b3 · openeuler / raspberrypi-kernel

03 11月, 2015 2 次提交

libceph: evaluate osd_req_op_data() arguments only once · 8a703a38

由 Ioana Ciornei 提交于 10月 22, 2015

This patch changes the osd_req_op_data() macro to not evaluate
arguments more than once in order to follow the kernel coding style.
Signed-off-by: NIoana Ciornei <ciorneiioana@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
[idryomov@gmail.com: changelog, formatting]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8a703a38

libceph: remove con argument in handle_reply() · 70cf052d

由 Shraddha Barke 提交于 10月 18, 2015

Since handle_reply() does not use its con argument, remove it.
Signed-off-by: NShraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

70cf052d

16 10月, 2015 1 次提交

rbd: use writefull op for object size writes · e30b7577

由 Ilya Dryomov 提交于 10月 07, 2015

This covers only the simplest case - an object size sized write, but
it's still useful in tiering setups when EC is used for the base tier
as writefull op can be proxied, saving an object promotion.

Even though updating ceph_osdc_new_request() to allow writefull should
just be a matter of fixing an assert, I didn't do it because its only
user is cephfs.  All other sites were updated.

Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e30b7577

09 9月, 2015 1 次提交

libceph: check data_len in ->alloc_msg() · d15f9d69

由 Ilya Dryomov 提交于 9月 02, 2015

Only ->alloc_msg() should check data_len of the incoming message
against the preallocated ceph_msg, doing it in the messenger is not
right.  The contract is that either ->alloc_msg() returns a ceph_msg
which will fit all of the portions of the incoming message, or it
returns NULL and possibly sets skip, signaling whether NULL is due to
an -ENOMEM.  ->alloc_msg() should be the only place where we make the
skip/no-skip decision.

I stumbled upon this while looking at con/osd ref counting.  Right now,
if we get a non-extent message with a larger data portion than we are
prepared for, ->alloc_msg() returns a ceph_msg, and then, when we skip
it in the messenger, we don't put the con/osd ref acquired in
ceph_con_in_msg_alloc() (which is normally put in process_message()),
so this also fixes a memory leak.

An existing BUG_ON in ceph_msg_data_cursor_init() ensures we don't
corrupt random memory should a buggy ->alloc_msg() return an unfit
ceph_msg.

While at it, I changed the "unknown tid" dout() to a pr_warn() to make
sure all skips are seen and unified format strings.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d15f9d69

25 6月, 2015 3 次提交

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14
Y
libceph: properly release STAT request's raw_data_in · 66ba609f
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
66ba609f

21 5月, 2015 2 次提交

Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · 521a04d0

由 Ilya Dryomov 提交于 5月 11, 2015

This reverts commit ba9d114e.

.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies.  In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().

The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.

Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

521a04d0

libceph: request a new osdmap if lingering request maps to no osd · b0494532

由 Ilya Dryomov 提交于 5月 11, 2015

This commit does two things.  First, if there are any homeless
lingering requests, we now request a new osdmap even if the osdmap that
is being processed brought no changes, i.e. if a given lingering
request turned homeless in one of the previous epochs and remained
homeless in the current epoch.  Not doing so leaves us with a stale
osdmap and as a result we may miss our window for reestablishing the
watch and lose notifies.

MON=1 OSD=1:

    # cat linger-needmap.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    rbd map dne/dne # obtain a new osdmap as a side effect (!)
    sleep 1
    ceph osd in 0
    rbd resize --size 2 test
    # rbd info test | grep size -> 2M
    # blockdev --getsize $DEV -> 1M

N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
above is enough to make it miss that resize notify, but that is a
bug^Wlimitation of ceph watch/notify v1.

Second, homeless lingering requests are now kicked just like those
lingering requests whose mapping has changed.  This is mainly to
recognize that a homeless lingering request makes no sense and to
preserve the invariant that a registered lingering request is not
sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
which commit ba9d114e ("libceph: clear r_req_lru_item in
__unregister_linger_request()") tried to fix the _wrong_ way.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b0494532

19 2月, 2015 2 次提交

libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3

由 Ilya Dryomov 提交于 2月 16, 2015

a255651d ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b28ec2f3

libceph: fix double __remove_osd() problem · 7eb71e03

由 Ilya Dryomov 提交于 2月 17, 2015

It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            <osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
  osd_reset()
                              <osdmap - osd3 down>
                              ceph_osdc_handle_map()
                                <takes map_sem>
                                kick_requests()
                                  <takes request_mutex>
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  <releases request_mutex>
                                <releases map_sem>
    <takes map_sem>
    <takes request_mutex>
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() <-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7eb71e03

18 12月, 2014 4 次提交

libceph: specify position of extent operation · 715e4cd4

由 Yan, Zheng 提交于 11月 13, 2014

allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

715e4cd4

libceph: add CREATE osd operation support · 864e9197

由 Yan, Zheng 提交于 11月 13, 2014

Add CEPH_OSD_OP_CREATE support.  Also change libceph to not treat
CEPH_OSD_OP_DELETE as an extent op and add an assert to that end.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

864e9197

Y
libceph: add SETXATTR/CMPXATTR osd operations support · d74b50be
由 Yan, Zheng 提交于 11月 12, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
```
d74b50be
Y
libceph: message signature support · 33d07337
由 Yan, Zheng 提交于 11月 04, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
33d07337

14 11月, 2014 3 次提交

libceph: change from BUG to WARN for __remove_osd() asserts · cc9f1f51

由 Ilya Dryomov 提交于 11月 05, 2014

No reason to use BUG_ON for osd request list assertions.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

cc9f1f51

libceph: clear r_req_lru_item in __unregister_linger_request() · ba9d114e

由 Ilya Dryomov 提交于 11月 05, 2014

kick_requests() can put linger requests on the notarget list.  This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

ba9d114e

libceph: unlink from o_linger_requests when clearing r_osd · a390de02

由 Ilya Dryomov 提交于 11月 04, 2014

Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map dne/dne # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a390de02

15 10月, 2014 5 次提交

libceph: sync osd op definitions in rados.h · 70b5bfa3

由 Ilya Dryomov 提交于 10月 02, 2014

Bring in missing osd ops and strings, use macros to eliminate multiple
points of maintenance.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

70b5bfa3

libceph: don't try checking queue_work() return value · 91883cd2

由 Ilya Dryomov 提交于 9月 11, 2014

queue_work() doesn't "fail to queue", it returns false if work was
already on a queue, which can't happen here since we allocate
event_work right before we queue it.  So don't bother at all.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

91883cd2

libceph: Convert pr_warning to pr_warn · b9a67899

由 Joe Perches 提交于 9月 09, 2014

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

b9a67899

libceph: resend lingering requests with a new tid · 2cc6128a

由 Ilya Dryomov 提交于 9月 03, 2014

Both not yet registered (r_linger && list_empty(&r_linger_item)) and
registered linger requests should use the new tid on resend to avoid
the dup op detection logic on the OSDs, yet we were doing this only for
"registered" case. Factor out and simplify the "registered" logic and
use the new helper for "not registered" case as well.

Fixes: http://tracker.ceph.com/issues/8806Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

2cc6128a

libceph: abstract out ceph_osd_request enqueue logic · f671b581

由 Ilya Dryomov 提交于 9月 02, 2014

Introduce __enqueue_request() and switch to it.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f671b581

08 7月, 2014 9 次提交

libceph: nuke ceph_osdc_unregister_linger_request() · 2d05f082

由 Ilya Dryomov 提交于 6月 24, 2014

Remove now unused ceph_osdc_unregister_linger_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

2d05f082

libceph: introduce ceph_osdc_cancel_request() · c9f9b93d

由 Ilya Dryomov 提交于 6月 19, 2014

Introduce ceph_osdc_cancel_request() intended for canceling requests
from the higher layers (rbd and cephfs).  Because higher layers are in
charge and are supposed to know what and when they are canceling, the
request is not completed, only unref'ed and removed from the libceph
data structures.

__cancel_request() is no longer called before __unregister_request(),
because __unregister_request() unconditionally revokes r_request and
there is no point in trying to do it twice.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c9f9b93d

libceph: fix linger request check in __unregister_request() · 4f23409e

由 Ilya Dryomov 提交于 6月 20, 2014

We should check if request is on the linger request list of any of the
OSDs, not whether request is registered or not.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

4f23409e

libceph: unregister only registered linger requests · af593064

由 Ilya Dryomov 提交于 6月 20, 2014

Linger requests that have not yet been registered should not be
unregistered by __unregister_linger_request().  This messes up ref
count and leads to use-after-free.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

af593064

libceph: assert both regular and lingering lists in __remove_osd() · 7c6e6fc5

由 Ilya Dryomov 提交于 6月 18, 2014

It is important that both regular and lingering requests lists are
empty when the OSD is removed.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7c6e6fc5

libceph: harden ceph_osdc_request_release() a bit · 6562d661

由 Ilya Dryomov 提交于 6月 20, 2014

Add some WARN_ONs to alert us when we try to destroy requests that are
still registered.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

6562d661

libceph: move and add dout()s to ceph_osdc_request_{get,put}() · 9e94af20

由 Ilya Dryomov 提交于 6月 20, 2014

Add dout()s to ceph_osdc_request_{get,put}().  Also move them to .c and
turn kref release callback into a static function.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

9e94af20

libceph: add maybe_move_osd_to_lru() and switch to it · bbf37ec3

由 Ilya Dryomov 提交于 6月 20, 2014

Abstract out __move_osd_to_lru() logic from __unregister_request() and
__unregister_linger_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

bbf37ec3

libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item · 1d0326b1

由 Ilya Dryomov 提交于 6月 20, 2014

So that:

req->r_osd_item --> osd->o_requests list
req->r_linger_osd_item --> osd->o_linger_requests list
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

1d0326b1

12 6月, 2014 1 次提交

ceph: remove bogus extern · f6479449

由 stephen hemminger 提交于 6月 10, 2014

Sparse complained about this bogus extern on definition of
a function.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6479449

05 4月, 2014 2 次提交

libceph: return primary from ceph_calc_pg_acting() · 8008ab10

由 Ilya Dryomov 提交于 3月 24, 2014

In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly.  Primary is now specified separately
from the order of osds in the set.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

8008ab10

libceph: split osdmap allocation and decode steps · a2505d63

由 Ilya Dryomov 提交于 3月 13, 2014

Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a2505d63

03 4月, 2014 2 次提交

libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op · c647b8a8

由 Ilya Dryomov 提交于 2月 25, 2014

This is primarily for rbd's benefit and is supposed to combat
fragmentation:

"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks.  We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."

SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c647b8a8

libceph: encode CEPH_OSD_OP_FLAG_* op flags · 7b25bf5f

由 Ilya Dryomov 提交于 2月 25, 2014

Encode ceph_osd_op::flags field so that it gets sent over the wire.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7b25bf5f

14 2月, 2014 1 次提交

net: remove unnecessary return's · 2045ceae

由 stephen hemminger 提交于 2月 12, 2014

One of my pet coding style peeves is the practice of
adding extra return; at the end of function.
Kill several instances of this in network code.

I suppose some coccinelle wizardy could do this automatically.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2045ceae

08 2月, 2014 2 次提交

libceph: take map_sem for read in handle_reply() · ff513ace

由 Ilya Dryomov 提交于 2月 03, 2014

Handling redirect replies requires both map_sem and request_mutex.
Taking map_sem unconditionally near the top of handle_reply() avoids
possible race conditions that arise from releasing request_mutex to be
able to acquire map_sem in redirect reply case.  (Lock ordering is:
map_sem, request_mutex, crush_mutex.)
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

ff513ace

libceph: factor out logic from ceph_osdc_start_request() · 0bbfdfe8

由 Ilya Dryomov 提交于 1月 31, 2014

Factor out logic from ceph_osdc_start_request() into a new helper,
__ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
taking locks and calling __ceph_osdc_start_request().
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

0bbfdfe8