提交 · 8a703a383dd3458753e0ad71860ed3a5097692b3 · openeuler / raspberrypi-kernel

03 11月, 2015 4 次提交

libceph: evaluate osd_req_op_data() arguments only once · 8a703a38

由 Ioana Ciornei 提交于 10月 22, 2015

This patch changes the osd_req_op_data() macro to not evaluate
arguments more than once in order to follow the kernel coding style.
Signed-off-by: NIoana Ciornei <ciorneiioana@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
[idryomov@gmail.com: changelog, formatting]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8a703a38

libceph: introduce ceph_x_authorizer_cleanup() · cbf99a11

由 Ilya Dryomov 提交于 10月 26, 2015

Commit ae385eaf ("libceph: store session key in cephx authorizer")
introduced ceph_x_authorizer::session_key, but didn't update all the
exit/error paths.  Introduce ceph_x_authorizer_cleanup() to encapsulate
ceph_x_authorizer cleanup and switch to it.  This fixes ceph_x_destroy(),
which currently always leaks key and ceph_x_build_authorizer() error
paths.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

cbf99a11

libceph: use local variable cursor instead of &msg->cursor · 343128ce

由 Shraddha Barke 提交于 10月 19, 2015

Use local variable cursor in place of &msg->cursor in
read_partial_msg_data() and write_partial_msg_data().
Signed-off-by: NShraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

343128ce

libceph: remove con argument in handle_reply() · 70cf052d

由 Shraddha Barke 提交于 10月 18, 2015

Since handle_reply() does not use its con argument, remove it.
Signed-off-by: NShraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

70cf052d

16 10月, 2015 1 次提交

rbd: use writefull op for object size writes · e30b7577

由 Ilya Dryomov 提交于 10月 07, 2015

This covers only the simplest case - an object size sized write, but
it's still useful in tiering setups when EC is used for the base tier
as writefull op can be proxied, saving an object promotion.

Even though updating ceph_osdc_new_request() to allow writefull should
just be a matter of fixing an assert, I didn't do it because its only
user is cephfs.  All other sites were updated.

Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e30b7577

18 9月, 2015 1 次提交

libceph: don't access invalid memory in keepalive2 path · 7f61f545

由 Ilya Dryomov 提交于 9月 14, 2015

This

    struct ceph_timespec ceph_ts;
    ...
    con_out_kvec_add(con, sizeof(ceph_ts), &ceph_ts);

wraps ceph_ts into a kvec and adds it to con->out_kvec array, yet
ceph_ts becomes invalid on return from prepare_write_keepalive().  As
a result, we send out bogus keepalive2 stamps.  Fix this by encoding
into a ceph_timespec member, similar to how acks are read and written.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

7f61f545

09 9月, 2015 6 次提交

libceph: check data_len in ->alloc_msg() · d15f9d69

由 Ilya Dryomov 提交于 9月 02, 2015

Only ->alloc_msg() should check data_len of the incoming message
against the preallocated ceph_msg, doing it in the messenger is not
right.  The contract is that either ->alloc_msg() returns a ceph_msg
which will fit all of the portions of the incoming message, or it
returns NULL and possibly sets skip, signaling whether NULL is due to
an -ENOMEM.  ->alloc_msg() should be the only place where we make the
skip/no-skip decision.

I stumbled upon this while looking at con/osd ref counting.  Right now,
if we get a non-extent message with a larger data portion than we are
prepared for, ->alloc_msg() returns a ceph_msg, and then, when we skip
it in the messenger, we don't put the con/osd ref acquired in
ceph_con_in_msg_alloc() (which is normally put in process_message()),
so this also fixes a memory leak.

An existing BUG_ON in ceph_msg_data_cursor_init() ensures we don't
corrupt random memory should a buggy ->alloc_msg() return an unfit
ceph_msg.

While at it, I changed the "unknown tid" dout() to a pr_warn() to make
sure all skips are seen and unified format strings.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d15f9d69

Y
libceph: use keepalive2 to verify the mon session is alive · 8b9558aa
由 Yan, Zheng 提交于 9月 01, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
8b9558aa

libceph: set 'exists' flag for newly up osd · 6dd74e44

由 Yan, Zheng 提交于 8月 28, 2015

Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6dd74e44

libceph: rename con_work() to ceph_con_workfn() · 68931622

由 Ilya Dryomov 提交于 7月 03, 2015

Even though it's static, con_work(), being a work func, shows up in
various stacktraces a lot.  Prefix it with ceph_.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

68931622

libceph: Avoid holding the zero page on ceph_msgr_slab_init errors · d920ff6f

由 Benoît Canet 提交于 6月 25, 2015

ceph_msgr_slab_init may fail due to a temporary ENOMEM.

Delay a bit the initialization of zero_page in ceph_msgr_init and
reorder its cleanup in _ceph_msgr_exit so it's done in reverse
order of setup.

BUG_ON() will not suffer to be postponed in case it is triggered.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d920ff6f

libceph: remove the unused macro AES_KEY_SIZE · b79b2368

由 Nicholas Krause 提交于 7月 05, 2015

This removes the no longer used macro AES_KEY_SIZE as no functions use
this macro anymore and thus this macro can be removed due it no longer
being required.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b79b2368

05 9月, 2015 1 次提交

fs: create and use seq_show_option for escaping · a068acf2

由 Kees Cook 提交于 9月 04, 2015

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NJan Kara <jack@suse.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a068acf2

10 7月, 2015 2 次提交

libceph: treat sockaddr_storage with uninitialized family as blank · c44bd69c

由 Ilya Dryomov 提交于 7月 09, 2015

addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer.  As it is,
we never learn our address and always send out a blank one.  This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.

While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

c44bd69c

libceph: enable ceph in a non-default network namespace · 757856d2

由 Ilya Dryomov 提交于 6月 25, 2015

Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

757856d2

01 7月, 2015 1 次提交

crush: fix a bug in tree bucket decode · 82cd003a

由 Ilya Dryomov 提交于 6月 29, 2015

struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used.  -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews.  The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJosh Durgin <jdurgin@redhat.com>

82cd003a

30 6月, 2015 1 次提交

libceph: Fix ceph_tcp_sendpage()'s more boolean usage · c2cfa194

由 Benoît Canet 提交于 6月 25, 2015

From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:

bool    last_piece;     /* current is last piece */

In ceph_msg_data_next():

*last_piece = cursor->last_piece;

A call to ceph_msg_data_next() is followed by:

ret = ceph_tcp_sendpage(con->sock, page, page_offset,
                        length, last_piece);

while ceph_tcp_sendpage() is:

static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
                             int offset, size_t size, bool more)

The logic is inverted: correct it.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c2cfa194

25 6月, 2015 9 次提交

libceph: Remove spurious kunmap() of the zero page · 6ba8edc0

由 Benoît Canet 提交于 6月 24, 2015

ceph_tcp_sendpage already does the work of mapping/unmapping
the zero page if needed.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6ba8edc0

crush: sync up with userspace · b459be73

由 Ilya Dryomov 提交于 6月 12, 2015

.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
<xerofoify@gmail.com> is folded in as crush_ln() has been made static
in userspace as well.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b459be73

crush: fix crash from invalid 'take' argument · 8f529795

由 Ilya Dryomov 提交于 6月 12, 2015

Verify that the 'take' argument is a valid device or bucket.
Otherwise ignore it (do not add the value to the working vector).

Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8f529795

libceph: fix wrong name "Ceph filesystem for Linux" · 6c13a6bb

由 Hong Zhiguo 提交于 6月 10, 2015

modinfo libceph prints the module name "Ceph filesystem for Linux",
which is same as the real fs module ceph. It's confusing.
Signed-off-by: NHong Zhiguo <zhiguohong@tencent.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6c13a6bb

libceph: a couple tweaks for wait loops · 216639dd

由 Ilya Dryomov 提交于 5月 19, 2015

- return -ETIMEDOUT instead of -EIO in case of timeout
- wait_event_interruptible_timeout() returns time left until timeout
  and since it can be almost LONG_MAX we had better assign it to long
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

216639dd

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

libceph: use kvfree() instead of open-coding it · b01da6a0

由 Ilya Dryomov 提交于 5月 04, 2015

This one sneaked in through vfs tree with commit 2b777c9d
("ceph_sync_read: stop poking into iov_iter guts").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b01da6a0

Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14
Y
libceph: properly release STAT request's raw_data_in · 66ba609f
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
66ba609f

21 5月, 2015 2 次提交

Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · 521a04d0

由 Ilya Dryomov 提交于 5月 11, 2015

This reverts commit ba9d114e.

.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies.  In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().

The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.

Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

521a04d0

libceph: request a new osdmap if lingering request maps to no osd · b0494532

由 Ilya Dryomov 提交于 5月 11, 2015

This commit does two things.  First, if there are any homeless
lingering requests, we now request a new osdmap even if the osdmap that
is being processed brought no changes, i.e. if a given lingering
request turned homeless in one of the previous epochs and remained
homeless in the current epoch.  Not doing so leaves us with a stale
osdmap and as a result we may miss our window for reestablishing the
watch and lose notifies.

MON=1 OSD=1:

    # cat linger-needmap.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    rbd map dne/dne # obtain a new osdmap as a side effect (!)
    sleep 1
    ceph osd in 0
    rbd resize --size 2 test
    # rbd info test | grep size -> 2M
    # blockdev --getsize $DEV -> 1M

N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
above is enough to make it miss that resize notify, but that is a
bug^Wlimitation of ceph watch/notify v1.

Second, homeless lingering requests are now kicked just like those
lingering requests whose mapping has changed.  This is mainly to
recognize that a homeless lingering request makes no sense and to
preserve the invariant that a registered lingering request is not
sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
which commit ba9d114e ("libceph: clear r_req_lru_item in
__unregister_linger_request()") tried to fix the _wrong_ way.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b0494532

11 5月, 2015 1 次提交

net: Add a struct net parameter to sock_create_kern · eeb1bd5c

由 Eric W. Biederman 提交于 5月 08, 2015

This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb1bd5c

22 4月, 2015 3 次提交

crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765

由 Ilya Dryomov 提交于 4月 14, 2015

This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

958a2765

crush: ensuring at most num-rep osds are selected · 45002267

由 Ilya Dryomov 提交于 4月 14, 2015

Crush temporary buffers are allocated as per replica size configured
by the user.  When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash.  Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                          234b066ba04976783d15ff2abc3e81b6cc06fb10.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

45002267

I
crush: drop unnecessary include from mapper.c · 9be6df21
由 Ilya Dryomov 提交于 4月 14, 2015
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
9be6df21

20 4月, 2015 3 次提交

libceph: expose client options through debugfs · 5cf7bd30

由 Ilya Dryomov 提交于 3月 25, 2015

Add a client_options attribute for showing libceph options.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5cf7bd30

libceph, ceph: split ceph_show_options() · ff40f9ae

由 Ilya Dryomov 提交于 3月 25, 2015

Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ff40f9ae

libceph: don't overwrite specific con error msgs · 67c64eb7

由 Ilya Dryomov 提交于 3月 23, 2015

- specific con->error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

67c64eb7

08 4月, 2015 1 次提交

Revert "libceph: use memalloc flags for net IO" · 6d7fdb0a

由 Ilya Dryomov 提交于 4月 02, 2015

This reverts commit 89baaa57.

Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support.  (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.)  See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.

On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.

[1] "SOCK_MEMALLOC vs loopback"
    http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
    http://www.spinics.net/lists/ceph-devel/msg23392.html

Conflicts:
	net/ceph/messenger.c [ context: tcp_nodelay option ]

Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NMike Christie <michaelc@cs.wisc.edu>
Acked-by: NMel Gorman <mgorman@suse.de>

6d7fdb0a

19 2月, 2015 4 次提交

libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3

由 Ilya Dryomov 提交于 2月 16, 2015

a255651d ("ceph: ensure auth ops are defined before use") made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b28ec2f3

libceph: fix double __remove_osd() problem · 7eb71e03

由 Ilya Dryomov 提交于 2月 17, 2015

It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            <osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
  osd_reset()
                              <osdmap - osd3 down>
                              ceph_osdc_handle_map()
                                <takes map_sem>
                                kick_requests()
                                  <takes request_mutex>
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  <releases request_mutex>
                                <releases map_sem>
    <takes map_sem>
    <takes request_mutex>
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() <-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7eb71e03

libceph: tcp_nodelay support · ba988f87

由 Chaitanya Huilgol 提交于 1月 23, 2015

TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.
Signed-off-by: NChaitanya Huilgol <chaitanya.huilgol@sandisk.com>
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ba988f87

I
libceph: use mon_client.c/put_generic_request() more · f646912d
由 Ilya Dryomov 提交于 12月 22, 2014
```
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
```
f646912d